Intro SemiAnalysis has been on a five-month long quest to settle the reality of MI300X. In theory, the MI300X should be at a huge advantage over Nvidia’s H100 and H200 in terms of specifications an…
semianalysis.com
As per this review that was just released - going by AMD's marketing and GPU specs, the MI300X should be a slam dunk against anything Nvidia has - yet as the review shows, reality is very different and it lags significantly behind Nvidia, and that's after AMD sent teams of engineers to update software top improve on site performance, against Nvidia's out of the box performance with no engineers required
Key Findings
- Comparing on paper FLOP/s and HBM Bandwidth/Capacity is akin to comparing cameras by merely examining megapixel count. The only way to tell the actual performance is to run benchmarking.
- Nvidia’s Out of the Box Performance & Experience is amazing, and we did not run into any bugs during our benchmarks. Nvidia tasked a single engineer to us for technical support, but we didn’t run into any Nvidia software bugs as such we didn’t need much support.
- AMD’s Out of the Box Experience is very difficult to work with and can require considerable patience and elbow grease to move towards a usable state. AMD's stable releases of AMD PyTorch is still broken and we needed workarounds.
- If we weren’t supported by multiple teams of AMD engineers triaging and fixing bugs in AMD software that we ran into, AMD’s results would have been much lower than Nvidia’s.
- For AMD, Real World Performance is nowhere close to its on paper marketed TFLOP/s.
- Training performance is weaker, as demonstrated by the MI300X ‘s matrix multiplication micro-benchmarks, and still lags that of Nvidia’s H100 and H200
- AMD’s training performance is also held back as the MI300X does not deliver strong scale out performance. This is due to its weaker ROCm Compute Communication Library (RCCL) and AMD’s lower degree of vertical integration with networking and switching hardware compared to Nvidia’s strong integration of its Nvidia Collective Communications Library (NCCL), InfiniBand/Spectrum-X network fabric and switches.
This really isn't worth doubling down on.
Nvidia own 99% of the software, so Nvidia's GPU's are going to run better / faster than AMD's even if the compute power on them is lower, the first Key Finding almost gets to that conclusion, almost, but not quite, they don't actually understand what they are writing about, like 90% of writers these days, or they do and that's not the point, the point is reassurance to Nvidia's equally dumb investors that no matter how much AMD can capitalise on their breakthrough efforts Nvidia still has the bigger willy.
Not everyone uses or even likes Nvidia's ECO system, Intel used to do the same, they created an ECO system and dictated it to their customers, its kind of good because you get a stable ready made environment for your hardware, For Intel.... they get to lock you in, so you become dependant on that ECO system and find it difficult to switch even if you wanted to.
AMD came along with a new idea, here it is: You tell us exactly what you need and we will build it for you, AMD have been doing that long enough now so other people idea's that AMD then turn in to reality are actually good appealing idea's to many other people.
Beyond that not everyone wants you to do it for them, sometimes, quite often actually all you need is the hardware, if the hardware is not black boxed then you can program it yourself, old school.
Nvidia's hardware or ECO system is not the be all and end all, it has its own problems and yes AMD's hardware is more powerful in TFLOP/s, that's why you're seeing that nonsense from Nvidia's marketing ARM's plastered all over the Internet now, People are starting to take notice of AMD's hardware and idea's, enough so that its rattling Nvidia.