Soldato
- Joined
- 25 Sep 2009
- Posts
- 10,208
- Location
- Billericay, UK
A 512 bit memory controller means this is going to be one very expensive card to produce.
Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
Could end up being anther 2900XT. 512bit memory bandwidth sounds tasty if they keep prices in check
hmmmm
im not so sure, AMD have just released the 7990 and now they may have a card with 7990 power on one chip?
if it is true ill get two!![]()
Yea...I still remember the whole hype about "6970 gonna beat GTX580 at a much lower price", and then everyone calling it lemon on the forum when it was eventually released![]()
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?AMD Radeon HD 9970 Hawaii Detailed, Volcanic Islands GPUs Set for Late 2013 [UPDATED]
http://news.softpedia.com/news/AMD-...c-Islands-GPUs-Set-for-Late-2013-351659.shtml
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?![]()
Yea...I still remember the whole hype about "6970 gonna beat GTX580 at a much lower price", and then everyone calling it lemon on the forum when it was eventually released![]()
Found a HQ picture on another forum. Means nothing to me but have at it gents.
![]()
The third technique, which isn’t technically stacking but still counts as “advanced packaging,” uses a silicon transposer (pictured above, below the stacked chips). A transposer is effectively a piece of silicon that acts like a “mini motherboard,” connecting two or more chips together (if you remember breadboard from your days as a budding electronic engineer, it’s the same kind of thing, but on a much smaller scale). The advantage of this technique is that you can reap the benefits of shorter wiring (higher bandwidth, lower power consumption), but the constituent chips don’t have to be changed at all. Transposers are expected to be used in upcoming multi-GPU Nvidia and AMD graphics cards.
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?![]()
So many daft statements in this thread, AMD's 20nm cards aren't and won't ever be an answer to Nvidia's 28nm "refresh" cards. Gpu's can't and won't ever compete across processes at the same die sizes.
Almost everything in the thread is pure speculation, but there are some things worth pointing out.
The ONLY reason people are suggesting x86/arm cores on die is because Nvidia has HINTED that they will have arm cores inside Maxwell or the one after, and I can't remember exactly what the statements are, it actually sounded more like they were just saying the units inside maxwell would be Arm compatible, because they'll be using maxwell shaders inside future Tegra chips.
Either way Nvidia isn't a cpu producer, and never has been, its used ARM in their soc's along with their own gpu's so it makes more sense for them to cannibalise something they have on hand to stick in their gpu's. With AMD, ARM is a very low power and efficient only at extremely low power. Its looking very much like Jaguar is going to be significantly faster than ARM for not much more power usage, for high work loads ARM isn't the most efficient chip around. If AMD try to split off serial and parallel work on die I would be surprised if it was Jaguar or ARM based, its most likely to not be any kind of "CPU" at all, but a very specific set of compute units just like shaders, but designed for a very different goal.
Compute units, are CU's, and they have nothing to actually do with compute, at all.
Its like having an octo core chip, then the next gen is octo core, but each core is vastly improved, but saying they've cut down compute because there is still only 8 cores.
Chips get subdivided as they get bigger. WIth old school 2 pipelines, there isn't any reason to subdivide the chip, when you hit 1600, you can't have the front end contact each shader individually, aside from a routing nightmare on die and huge latency across the chip, you separate the 1600 in to several big blocks, then you separate each big block into smaller blocks. Then after 5870, we had a chip that separated the front end into two halfs, and each gen we'll get more division. It works like a pyramid frankly, when you have 4000 shaders on the bottom rung, and 1 major first control unit at the top, you either have 4000 connections and the most inefficient chip on the planet, or you put in a bunch more levels. At one level you have 16 compute units, which will encorporate X amount of shader blocks, and each block will have x amount of shaders and x amount of cache, etc, etc.
On to 512bit, 512bit does increase price, and complexity of the PCB, but we're not talking about £200 increase in cost, nor a massive increase in die size. This is how chips work, what is not viable on 40 nm, becomes viable on 28nm, and something you can't fit on 28nm, you can fit on 20nm. There is no point having twice as much shading power as a 7970, and the same bus, you'll hit bandwidth limits, maybe more so on compute, but considering this is AMD's likely first HSA compatible GPU, to go with their end of year HSA compatible CPU's, is going to be a big concern as they'll obviously be making a push at the professional markets soon enough.
Either way, I've said in other threads, 28nm has been like an inbetween process, it wasn't quite the right size for AMD or Nvidia. It was too big for Nvidia to stick a 384bit bus on a 680gtx, and Titan was WAY to big. This gen is likely to work out much better, with 384bit bus's and increased shader counts, IE 7970 with 30-40% more shaders on a 200-220mm2 card being a 8870xt being sub £200, awesome, with a 8970 being a huge increase in performance, 70-80%, a 325-375mm2 card and just awesomeness.
Why people are banging on about Titan prices because of the specs, even new process brings a massive performance jump and similar prices.
The interesting things to point out are, the Arm cortex A5, this isn't new, AMD chips have used them for Arm trustzone for quite some time now, but the point is the Arm chip is highlighted as such on die, the "serial processing" unit, is NOT labeled as Arm, and infact contains a HT link, and the "cores" are two SPU units each sharing an FPU unit, which is what both Jaguar and Bulldozer/piledriver/steamroller do, which all points down to a similar kind of design with probably the vast vast majority of the rest of the core going away.
Its worth noting that Intel have gone for modules rather than individual cores in Atom, and removed HT, as AMD did(not the HT obviously), just earlier. I said many years ago, AMD are just going for more cores and lower IPC, and over the next several years/processes will expand each core/module to improve performance per core, Intel just went the other route, same cores, wider cores, and they are clearly moving towards more efficiency, more cores and sharing resources. In a couple gens AMD/Intel chips will be 8 cores with highly efficient cores, all in modules to save waste and gain efficiency(die size and power more than performance).
Anyway, while people are saying this is more an APU design than a GPU because it has a northbridge listed, its irrelevant, whenever you bring on different types of chips you need things linking them together.
Also while HSA, the entire point is to be able to chop and change IP blocks and have everything work together, and while part of AMD(the HSA foundation and most of the industry as well) is looking to offload things to discrete gpu's, and work from the gpu's offloaded to cpu's, latency will always be an issue and having serial work units on die will greatly improve performance above offloading serial stuff to the CPU, for some workloads at least.
In general we'll move forward to a point where you have an C-APU with mostly cpu, a little gpu, and a G-APU, with mostly gpu stuff and a little CPU. When a program can do cpu and gpu stuff concurrently, the workloads get split across the devices, but when the stuff being done in the GPU relies on the serial data being sped up, its done on die without the latency hit rather than moved off die.
Basically depending on the application there will be times its faster to go off die and do loads of serial or parallel work on the "other" device, and there will be applications where the particular serial or parallel work is smaller and its quicker to stay on die with a smaller acceleration unit.
The key at the moment, is having code optimised, and eventually being able to handle the decision on the fly where the CPU can decide where its best to send any given set of data to be processed quickest.
Its possible that is AMD's top end GPU bolted on to a custom jaguar cpu for a specific customer, or it could be that AMD has added a "cpu" to the gpu for this generation, it was always likely coming. Its impossible to tell from a high level achitecture diagram anything like performance or die size. Adding cpu like cores will obviously take die space away from shaders, but if you're removing say 30% of the work off the shaders to the cpu cores, but that also means that work gets done much faster AND leaves the gpu side idle while waiting for serial loads to be done much less, then you could potentially have significantly more performance than the same die all like a current gpu.
If it has 4000 shaders AND some cpu's on die and they get the balance right it could be significantly faster than what you'd expect a 4000 shader higher bandwidth 7970 to perform like.
One of the more interesting questions is, will they move to different clocks like Nvidia used to have(for different reasons), serial cores working at say 2Ghz like upper end of Jaguar, and gpu working at 1Ghz.
Even more interesting would be the question of production, you can already quite easily(but time consuming and more expensive to produce) have different parts of the chip using slightly different process, for instance the companion core in Tegra 3 uses a lower power optimised varient of 40nm process as the core is aimed at lower power, lower leakage and lower clocks. THe other potential option which we're moving towards.... but probably not quite there yet, is making the cpu, and gpu on entirely different silicon, on different processes(20nm HP for GPU and 20nm LP for cpu for instance) then putting them on a transposer(essentially a chip that is made purely to connect other bits of silicon, meaning two chips on the same interposer can communicate at bandwidth/latency that you can't do over a bus between two separate pieces of silicon like a CPU with discrete GPU).
The later is the "ultimate" way to do it and every foundry is working towards that, and Intel might be there already, TSMC probably aren't, the former is a very slow way of optimising different portions of the chip, it will reduce yields, purely because it takes longer to process the wafers, less wafers can be made in a given time which increases costs and reduces how many you can make.
http://www.extremetech.com/computing/119843-the-future-of-computers-3d-chip-stacking
Transposers are much closer and much more likely to be done sooner on a mass scale than TSV's, that link gives a good description, transposers are like a mini motherboard done in silicon that you can stick your chips on to, but you have very short lengths, incredible bandwidth and incredibly low latency, the chips can act almost as if they were on the same die.
TSV's stack the chips and, in a crude description its like taking a bunch of chips and stabbing a copper rod through them(that isn't how they are made, just how they would look), so there is a copper connection through all the chips. In reality yields take a huge hit, the chips have to be design specifically for it, and it takes loads more steps and is insanely expensive. It will happen eventually for most things but is much further out from mass scale production of cpu/gpu's than transposers.
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?![]()
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?![]()