• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Poll: ** The AMD VEGA Thread **

On or off the hype train?

  • (off) Train has derailed

    Votes: 207 39.2%
  • (on) Overcrowding, standing room only

    Votes: 100 18.9%
  • (never ever got on) Chinese escalator

    Votes: 221 41.9%

  • Total voters
    528
Status
Not open for further replies.
Grr all you want, the amount of games that hardware PhysX is in is often overstated, and most of the games that do use it only use it in a token gesture way or the basic physics are stripped back to create an artificially large gap between PhysX on and off.

Proprietary stuff doesn't get adopted, especially this sort of thing.



Loads of games use PhysX, I think most people don't realise that it's mostly CPU PhysX though.

I did not overstate anything you were the one who stated it wasn't used in 'a load' of games, it was, you were in the wrong, end of :p
Grr all you want, the amount of games that hardware PhysX is in is often overstated, and most of the games that do use it only use it in a token gesture way or the basic physics are stripped back to create an artificially large gap between PhysX on and off.

Proprietary stuff doesn't get adopted, especially this sort of thing.



Loads of games use PhysX, I think most people don't realise that it's mostly CPU PhysX though.
so it was used in a load of games then :confused: Are you new to this Nvidia bashing or something? :p
 
I think people hoping for it to emerge in Navi might be waiting awhile - all the technical documentation on that kind of approach published so far suggests you need DRAM shrunk atleast a node smaller than anything available now and modules manufactured on some kind of plus variant 7nm process to reduce issues due to trace length, etc.


And the end result isn;t some massive performance increase, it just lowers manufacturing cost.
a single 400mm^2 chip is more expensive than 4x100mm^2, especially with a immature process or a process that naturally offers low yields. So, given a time machine and lots of wishful thinking at some point in the future AMD and Nvidia may well offer N individual dies on some interposer working seamlessly together tied together by lots of blood, sweat and tears. At best the performance will be the same as a single large die, more liekly here are still overheads and non-linear scaling. AMD/Nvidia could charge you less for the same eprformance, but you wont get more performance. And then there is simply the fact that AMD/Nvidia would be quite happy to keep those extra profit margins to themselves.



Vega already has 4 separate Compute engines (NCU), putting each of those onto a separate die wont increase performance. They might be able to cut costs 20% to some such, minus the R&D effort. But if they really wanted to make Vega cheaper then not using HBM2 would liekly gie greater returns on investment!
 
I did not overstate anything you were the one who stated it wasn't used in 'a load' of games, it was, you were in the wrong, end of :p

so it was used in a load of games then :confused: Are you new to this Nvidia bashing or something? :p
I wasn't in the wrong at all. Hardware PhysX is barely used and in the majority of games it is used in are over 5 years old. 12 games in the last 2 years have used it, and some of them are just junk free to play games. Oh also, it's nVidia bashing to say that PhysX is barely used? What planet are you living on?
 
And the end result isn;t some massive performance increase, it just lowers manufacturing cost.
a single 400mm^2 chip is more expensive than 4x100mm^2, especially with a immature process or a process that naturally offers low yields. So, given a time machine and lots of wishful thinking at some point in the future AMD and Nvidia may well offer N individual dies on some interposer working seamlessly together tied together by lots of blood, sweat and tears. At best the performance will be the same as a single large die, more liekly here are still overheads and non-linear scaling. AMD/Nvidia could charge you less for the same eprformance, but you wont get more performance. And then there is simply the fact that AMD/Nvidia would be quite happy to keep those extra profit margins to themselves.



Vega already has 4 separate Compute engines (NCU), putting each of those onto a separate die wont increase performance. They might be able to cut costs 20% to some such, minus the R&D effort. But if they really wanted to make Vega cheaper then not using HBM2 would liekly gie greater returns on investment!

In a sense this is right, but also kind of wrong.

Manufacturing processes are limited to around 700-800mm2 die at maximum. And even if they could do higher it wouldn't be economically viable at all, due to exponentially worse yields as dies get bigger.

Using something like IF for an MCM design allows you to bypass this upper limit, and basically make your limit be reasonable power draw rather than die-size limit. So effectively makes the new limit how power efficient you can make your arch.

Even if the limit of MCM designs was then 4 dies, which seems to be the 'easy' limit and nothing so far suggests it's a hard limit, you could make 300mm2 dies and have effectively a 1200mm2 GPU while also being cheap to make. As you mentioned there will likely be downsides to this method, so it may 'only' perform as well as a 1000-1100mm2 monolithic die, but you couldn't even make that sized monolithic even if you were ok spending $50,000 per GPU.

So MCM designs have the advantage of simultaneously making GPUs cheaper to make and faster. It just comes down to how you balance the increased yield vs increased performance, based on what die size you decide to make.
 
Even if the limit of MCM designs was then 4 dies, which seems to be the 'easy' limit and nothing so far suggests it's a hard limit, you could make 300mm2 dies and have effectively a 1200mm2 GPU while also being cheap to make. As you mentioned there will likely be downsides to this method, so it may 'only' perform as well as a 1000-1100mm2 monolithic die, but you couldn't even make that sized monolithic even if you were ok spending $50,000 per GPU.

If they go for this approach they will go with a very different setup - you won't see 4x 300mm2 packages and some DRAM - you'd see say maybe 4-5x 40-60mm2 packages, 2-3 300-500mm2 packages, 1-2 ~75mm2 packages, etc. (just an example) as the idea is to effectively unfold a monolithic design out over the substrate so that your setup/scheduling hardware has direct access to all the heavy compute stuff which might be spread over more than one package using headless modules and control interfaces that might connect to several headless modules, etc.

This is enabled by some advances in substrate technology that make traces possible that previously only had the kind of latency, bandwidth, noise immunity, etc. required on very short lengths inside a semiconductor die.
 
If they go for this approach they will go with a very different setup - you won't see 4x 300mm2 packages and some DRAM - you'd see say maybe 4-5x 40-60mm2 packages, 2-3 300-500mm2 packages, 1-2 ~75mm2 packages, etc. (just an example) as the idea is to effectively unfold a monolithic design out over the substrate so that your setup/scheduling hardware has direct access to all the heavy compute stuff which might be spread over more than one package using headless modules and control interfaces that might connect to several headless modules, etc.

This is enabled by some advances in substrate technology that make traces possible that previously only had the kind of latency, bandwidth, noise immunity, etc. required on very short lengths inside a semiconductor die.

Is it possible they could create several different chips to link together that were better at certain tasks to maximise performance?
 
In a sense this is right, but also kind of wrong.

Manufacturing processes are limited to around 700-800mm2 die at maximum. And even if they could do higher it wouldn't be economically viable at all, due to exponentially worse yields as dies get bigger.

Using something like IF for an MCM design allows you to bypass this upper limit, and basically make your limit be reasonable power draw rather than die-size limit. So effectively makes the new limit how power efficient you can make your arch.

Even if the limit of MCM designs was then 4 dies, which seems to be the 'easy' limit and nothing so far suggests it's a hard limit, you could make 300mm2 dies and have effectively a 1200mm2 GPU while also being cheap to make. As you mentioned there will likely be downsides to this method, so it may 'only' perform as well as a 1000-1100mm2 monolithic die, but you couldn't even make that sized monolithic even if you were ok spending $50,000 per GPU.

So MCM designs have the advantage of simultaneously making GPUs cheaper to make and faster. It just comes down to how you balance the increased yield vs increased performance, based on what die size you decide to make.


Even if AMD/nv would produce a monster GPU with 1200mm2 equivalent die area, you would still be looking at 10K at least for the die alone.

Costs don't suddenly disappear.

This isn't about gluing together giant chis, but different small modules that have a similar area to the current dies. This is because each new node is increasing in costs exponentially, which is why in general GPU prices are increasing at the highest end and at the lower end we are getting relative more cutdown chips.


This is purely about reducing costs beyond 7nm
 
It isn't just about improving costs if it helps them run multiple dies together as opposed to not being able to though. It will give performance benefits through removal of a limitation.
 
I wasn't in the wrong at all. Hardware PhysX is barely used and in the majority of games it is used in are over 5 years old. 12 games in the last 2 years have used it, and some of them are just junk free to play games. Oh also, it's nVidia bashing to say that PhysX is barely used? What planet are you living on?

All you need to do is compare the list of PhysX AAA titles with Havok AAA titles and you have your answer. Havok still appears to be the go to physics engine some 13 years after I was forced to code a game in it for a university project. Probably because if I remember rightly it was an absolute doddle to code with. Hardware PhysX pretty much died years ago with dedicated PhysX hardware as you rightly point out. NV trolls gonna troll.
 
Is it possible they could create several different chips to link together that were better at certain tasks to maximise performance?
Removal of what limitation?


no one is going to sell a GPU that has a massive total die area, even if it is composed of many smaller dies.




there are also some fundamental issues in even scaling something that larger. the Nvidia Volta GV100 is limited in 2 ways, the die can't physically be made any bigger, but also the interposer is at the maximum size. Actually the HBM2 memory chips supposedly overhang the interposer because the interposer can't be made any bigger.


And then there are some unfortunate practicalities. E.g., ets say you put 2 dies on one interposer with shared HBM memory. If both GPUs need the same resource like a texture to render a part of the scene then the data will be dupicated to each GPU, the bandwidth is shared between GPUs so each GPU is effectively getting half the bandwidth it would have had if it was operating alone. This is why crossfire does work, because the memory is duplicated then the effective bandwidth is duplicated. If you don't scale the memory then you don't scale performance.
 
Last edited:
Another thing people seem to mistakenly think is that the ryzen cores magically act as a single faster core. They don't, everything has to be specially coded in a multi-threaded way. This is far form eays, and is a skill many people graduate with a CS degree from university not being able to do. It is incredibly difficult to write good multi-threaded code. Your standard libraries no longer work and are not safe. Some of you standard functions will be erroneous. And once you have made your thread-safe code you will then find out it runs slower than a single threaded version because of mjutex races and resoruce thrashing.

Graphics cards already scale to multiple dies ina way very similar to multiple CPU coress. It takes a lot of care, a lot of skill, and doesn't always work very well.
 
I wasn't in the wrong at all. Hardware PhysX is barely used and in the majority of games it is used in are over 5 years old. 12 games in the last 2 years have used it, and some of them are just junk free to play games. Oh also, it's nVidia bashing to say that PhysX is barely used? What planet are you living on?

I can not remember the last time I played a game with Hardware PhysX even though I have enough hardware to dedicate to it.

For me the NVidia PhysX driver is just bloatware and does not get installed anymore.
 
Oh by the way hope y'all realize. AMD has said "Something" happens on the 14'th, but we don't -REALLY- know what's going to happen on the 14'th. For all we know AMD Could very well you know...... just release the NDA For reviewers on the 14'th so we can see reviews, but actually sell the cards some time in early september, and release all the reference and board-partner cards all at once... and delay yet again. Wouldn't surprise me at this point.
 
Last edited:
Status
Not open for further replies.
Back
Top Bottom