Ryan Smith from AT made some performance/watt calculations:
https://twitter.com/RyanSmithAT/status/1300887187886538752
https://twitter.com/RyanSmithAT/status/1300887187886538752
Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
What year are you in dude? SLI is dead. nvLink does scale the memory, the RTX cards were the first to do this.
There must be sheeple losing their minds over what they have just had to pay for 2nd 2080 TI's on the bay. There's still live auctions where the highest bid is £750.
Nvidia's pricing of the 3080 is surprising, I expected that to retail at £800 to £900 not £650. To me that shows Nvidia isn't complacent over it's position and are determined to cut RDNA 2/Big Navi off at pass. Makes you wonder if these chips are coming out of Samsung rather then TSMC for Nvidia to afford such an aggressive price point.
That's not news, AMD have been on the back foot in this market for a while now so nothing had changed.amd has an uphill battle.
I mean, wow. Two thirds of the way through now. Hell of a presentation.
The one thing that is bugging me though.. performance WITHOUT all the stuff like DLSS/RTX turned on? How does it compare?
Too say that AMD has it's work cut out is a bit of an understatement.
did their presentation say if 3000 series are getting win7-64 drivers at all?
They didn't - but the problem is DXR/RTX uses DX12,and DX12 support under Windows 7 is very limited IIRC.
yeah but can't they still do older DX generations? I'm not wanting to run the latest games on 7, just for the thing to function so I can still use my win7 system when I need to.
What year are you in dude?
What year are you in dude?
Ironically, no one has used this line since the 1990's![]()
boxed, ready but not sent. I needed something for them to open on the day. Given I paid for the giftwrap may as well get that bit of it as I doubt they'll refund that charge. Should I keep the gift bag part or send that as well?Good to hear! I thought you sent it back already??![]()
boxed, ready but not sent. I needed something for them to open on the day. Given I paid for the giftwrap may as well get that bit of it as I doubt they'll refund that charge. Should I keep the gift bag part or send that as well?
looks like it;s discounting the gift wrap cost too which is amazing and unexpected! Got my label to print out and off it goes.
Could you elaborate a little on this doubling of CUDA cores? How does it affect the general architectures of the GPCs? How much of a challenge is it to keep all those FP32 units fed? What was done to ensure high occupancy?
[Tony Tamasi] One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.
Doubling the processing speed for FP32 improves performance for a number of common graphics and compute operations and algorithms. Modern shader workloads typically have a mixture of FP32 arithmetic instructions such as FFMA, floating point additions (FADD), or floating point multiplications (FMUL), combined with simpler instructions such as integer adds for addressing and fetching data, floating point compare, or min/max for processing results, etc. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are good examples that might benefit greatly from doubling FP32 throughput.
Doubling math throughput required doubling the data paths supporting it, which is why the Ampere SM also doubled the shared memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super.
Like prior NVIDIA GPUs, Ampere is composed of Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and memory controllers.
The GPC is the dominant high-level hardware block with all of the key graphics processing units residing inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. More details on the NVIDIA Ampere architecture can be found in NVIDIA’s Ampere Architecture White Paper, which will be published in the coming days.