• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

PowerColor 6970 pictured and benched

I don't understand how he manages to make every reponse a novel, doesn't matter how small the point he always goes overboard =/
 
go on, go on...

AMD must have made some phone calls right? everyone's starting to spill the beans...

TBH in my view they needed to as it was getting slatted badly for the last 3-4 days.

Also I second the comment about benchmarking programs, give us game results as you cant play benchmarking programs. They are just for epeen spectators :p
 
***laughs***

Duffman - can you do us a summary?

Summary
He's saying the exact same thing as before... That (somehow) the removal of the special function unit (i.e. the change to a "4D" shader grouping rather than "5D") is more of a dramatic change than the reworking of the entire pipeline process in Fermi. This is most certainly not a viewpoint shared by anyone else in the hardware community, and when you look at the list of changes made to the architecture in Fermi (that I describe here) then you can see why.

His reason for dismissing the massive changes in Fermi is that "they do the same job even though they have been broken down and moved around".



------- response to "points" ---------

Well anyway, to counter these points briefly (I have no intention of getting into an essay-writing contest on a point that nobody else but him has any doubt over):

1. There are only a few basic operations that a modern GPU need to perform. So from that point of view ALL GPUs perform the same basic operations.

2. He asks: "do you think the core logic will send instructions in the same manner to two separate types of shaders, in a 5 way cluster, as it will a 4 way cluster with 4 identical shaders".

The answer is "for the most part, yes". There are differences to be sure, but from the point of view of data-structure they are not so dramatic as you might think. For example, when a transcendental operation is encountered, instead of sending an instruction to the special function unit, three of the "generalised" shaders are put to work operating on the transcendental. The bigger change takes place within the shader itself. For those of you with programming experience, think of it as swapping out one function for another. The code inside the function changes (c.f. the operation of the shaders), but the code to call it changes very little (c.f core logic).

The really dramatic changes come into play when you change the flow of data (i.e. alter the pipeline). Here you change the nature of the data you're passing around (like rewriting the "main" file in the programming analogy), and so everything that interfaces with it must also be altered. In terms of changes to the shader pipes, changing to a "4D" architecture is not too different to when Nvidia removed the "dangling float" from their GT200 architecture (in Fermi). The core is no longer able to process the "useless" third float within each clock cycle, but the mode of calling does not change dramatically because of this.

3. "it really does not matter in the slightest WHERE the TMU is if the TMU does exactly the same ruddy thing, it just doesn't."
That's just silly. So, if a particular modular unit (say the rasterizer, or a texture unit, or a geometry processor) is tied to a each SM, meaning that it processes data from, at most, 32 processing cores, you think it works in exactly the same way as if it's placed at global level? If nothing else, the way it connects to the other components will have changed (different bandwidth requirements, potentially with different data types / vector sizes transferred).

4. He conveniently does not address the global L2 cache, the parallel thread-kernel execution capabilities, or the formation of a new data heirachy (the division of the GPU into four "GPCs", linked only via the L2 cache). These are the most obvious 'high-level' changes.

... he is correct about one thing though, the division of the thread dispatcher occurred in Barts, not in Cayman. But this does not change any of the points I have made.

Finally, I point out once again that you need only read an analysis piece by a decent technical site (like anandtech or similar) to see how they view the changes that took place from GT200 to GF100 in comparison to those between generations of AMD products. I know that you think that you know better than all these people. But really, you don't. I also know you would like to believe that no "real development" went into GF100, since that would paint nvidia in an even worse light. But that is not the case.





------ Okay, here is GPU design 101 written in simple terms. Might be worth a read guys, even if you ignore the above -----

When you redesign a GPU, there are any number of adjustments you can make. In slightly over-simplified terms, you could group them into three catagories:

a) "Here and now" performance improvements. If your architecture still has "room to breathe" then you can simply add more processing / texturing power, and tweak the various sub-components.

b) To add new features. This will usually involve rearranging and expanding core-logic, and/or inserting new processing blocks. We saw this recently with tessellation: AMD added a single global tessellator that performed its own independent computations, whereas nvidia added a geometry setup unit to each SM (16 in total for Fermi), which uses the shader cores to perform tessellation.

c) To improve scalability: i.e. to allow you to perform (a) in the future without getting diminishing returns. This generally involves breaking down the existing sub-units into smaller, more modular versions, and moving them closer to the core.

- Operation (a) is the most preferable. You gain performance without dramatic changes to your architecture. That AMD have been able to perform largely these types of operation since r600 is a testament to the quality of that architecture and pipeline design.
- Operation (b) is, obviously, performed to include new features as and when required (for DX updates or for GPGPU functionality etc).
- Operation (c) is generally the most complex, as it requires a re-working of the entire GPU (from top-to-bottom of the pipeline), which requires changes in how each individual component works. This is not something you can afford to do every generation, and so it is always done with "one eye on the next few generations", i.e. with the idea to create something where only operation (a) is required for the next couple of generations.

Fermi was a type (c) update. As was G80 (the 8800GTX), and r600 (2900xt). All other updates since then have been largely of type (a) or (b). The switch to a 4D shader is a change to the base architecture, but as it does not affect the flow of the pipeline it does not require "a complete rework of the GPU". I think we will see further "type c" changes with the next AMD generation, though I suspect they will make these adjustments more gradually, having learned from nvidia (with Fermi) and themselves (with r600) that these massive rewrites don't always go as planned.
 
Last edited:
DM's wall is gonna be fkn hooge after that "summary" :p

Yeah... Well, I've made all the points that need making so he can write all he wants. I don't have infinite time to go over the same points.

Anyway, only the first few lines were a summary. The next 'block' is a counter to some of the "points" made. The final section is a more general description of the types of development that will be made when revising a GPU.

That last bit might be well worth a read - you might learn something :p The rest can be summarised as "DM thinks that Fermi was a minor rewrite, everyone else in the world thinks otherwise"...
 
Back
Top Bottom