Finally some real info?
Source
In the past few weeks, we've seen various
fishy rumors on the product specifications of
first discrete GPU using the upcoming 28nm
Kepler architecture the GK104. While we
have known parts of the specifications, such
as no hot clocks, the doubling of Streaming
Multiprocessor (SM) node from 48 to 96
CUDA cores (i.e. Stream Processors), 256-
bit memory controller, the real
specifications are (finally) here... even
though, our information differes minimally
from information originally posted on
3DCenter.org .
![]()
NVIDIA Kepler GK104 Architectural
overview: at first look, very similar to
GF110, but then you take a deeper look:
1536 Stream Processors instead of 512!
First and foremost, in NVIDIA's internal
nomenclature, this part should be named
GeForce GTX 660 (the company is debating
GeForce GTX 660, 670 or 680 - and the final
verdict will 99% be GTX 680). This is a
349-399 dollar part which in conventional
way would replace the 300-dollar "GeForce
GTX 560 Ti 2GB", but will offer higher
performance than GTX 580. Significantly
higher… and more importantly, not just
beating the $449 Radeon HD 7950 3GB, but
also endangering the $549 Radeon HD 7970.
Yeah, it is that fast.
Why? Because we're talking about 1536
CUDA cores divided in four Graphics
Processing Clusters (GPC), all of which
contain four Streaming Multiprocessors
(SM). Given that there are 96 Stream
Processors (or CUDA cores, NVIDIA seems
they cannot make up their minds how to call
them), we can see that for instance, the
entry-level Kepler has a single SM unit with
96 CUDA cores/Stream Processors. Can you
say… a mobile GPU part that allegedly
taped out ages ago… and just by some
accident, ended in a Samsung notebook?
Only time will tell for those.
The base combinations for NVIDIA future
GPUs now are 96 (1SM), 384 (1GPC), 768
(2GPC), 1536 (4GPC), 2304 CUDA cores/
Stream Processors (6GPC). Given that we
our sources are telling us the big monolithic
die comes with 2304 SP, the question is what
can be done with the memory controller.
The logic dictates Kepler can come with the
following memory controller configuration:
64-bit, 128-bit, 192-bit, 256-bit, 320-bit and
512-bit: to us, it is most logical that we see
64-bit low-end, 128-bit mainstream, 256-bit
high-end and either 384-bit / 512-bit on the
high-end compute side - and GeForce GTX
690, but this time as a single monolithic die,
instead of typical mix'n'match of two high-
end GPUs.
Continuing with the GK104 GPU, the chip has
the same amount of fixed-function logic as
competing Tahiti XT - 32 ROPs (Raster
OPeration Units) and 128 TMUs (Texture
Memory Units). As you can see in our
architectural mockup, the decision to go with
256-bit memory controller results in 2GB
GDDR5 and this is the only part where
NVIDIA really loses to AMD: both 7950 and
7970 come with 3GB GDDR5 memory. True,
the difference in planned price is estimated
at $100 less for NVIDIA boards ($349-399
versus $449/7950 and $549/7970), which
should mitigate the paper advantage of the
HD 7900 Series.
How high can it go?
Just like GF110, the GK104 comes in two
different versions: the GeForce board will
run double-precision at one sixth rate -
while Quadro and Tesla will run at typical
half-rate. Just like AMD Southern Islands, we
were told by one source that there is an
architectural possibility of full rate DP
(instruction, cache sizes) - but we do not
believe in fairy tales.
The GPU clock is estimated at 950MHz, but
our sources are telling us that there are
different clocks running in Lab: 772MHz for
clock-per-clock versus GTX 580, 925MHz for
clock-per-clock versus Tahiti XT, while the
clock range for the shipping parts is
between 950 and 1000MHz. We were told
that NVIDIA did not laugh too much at
Verdetrol performance enhancing pills and
that the company is trying to tweak the BIOS
(more importantly, thermal envelope) in
order to get the parts running at 1GHz. If
NVIDIA fails, the partners are certain to
offer a 1GHz board (just like in case of
Tahiti XT and 3rd party vendors).
The memory is set at 1.25 GHz in Quad-
Data Rate (QDR, i.e. 5GHz "effective"). This
25% boost over GF100/GF110 is something
that thrilled NVIDIA engineers, since this is
the first time their memory controllers were
able to reach AMD with stable default clock
frequency. Remember, unlike GDDR3
memory, GDDR5 is "activelly driven" and
memory controller does much more than it
used to. Given that AMD is actually the
company that creates the memory standard,
AMD's GPU engineers actually have a good
advantage in terms of just how high can they
clock the GDDR5 memory.
This clock results in 160GB/s video memory
bandwidth, a drop from GTX 580 (192.4GB/
s), but a big boost over GTX 560 Ti and its
128.27GB/s (excluding the OEM versions),
and just a bit higher from GTX 560 Ti OEM
(GF110 die), GTX 560 Ti 448 Cores LE and
GTX 570, all having the same GDDR5
memory clock and bandwidth of 152GB/s.
All of this results with 2.9 to 3.05 TFLOPS
single-precision, i.e. 486-500 GFLOPS
double-precision. Quadro and potential
Tesla versions of this board will feature
unlocked double-precision, meaning
identically clocked board would have
around the same amount of DP-GFLOPS as
GTX 580 had single-precision… an
impressive boost indeed. In any case higher
than what Fermi-based Quadros and Teslas
were able to achieve.
You won't need to wait for too long, as
NVIDIA is already starting pre-sale
activities, and getting ready to counter AMD
and their momentum with the Radeon 7700
(Cape Verde, February 15), 7800 (Pitcairn,
March 6) and 7900 Series (released).
Source

Really? Happens just about every high end release from amd or nvidia from what ive seen over the years.