Took the plunge - First Home Server

My current viewpoint is that running 4x, and potentially 8x, V100's on the one board should be possible.
Might be possible but its a waste of power. The cards are obsolete and won't run models created after December 2025 (assuming CUDA 13), they have less inferencing power than a 3090, and bugger all diffusion capability going forward because they don't have hardware support for the models.

Your "AI plan" is frankly nonsense. Sorry.
 
Last edited:
Yep that will all be compatible, but my point still stands - even a pair of those Xeons - 105 Watt each (24 cores / 48 threads) are barely a match for a 65W Ryzen 5600 (6 cores / 12 threads)
Best advice in this whole thread. Those Xeons were good back in the day, but a 9 year old CPU is far behind the curve nowadays - no matter how many cores and especially in performance per watt.

@OP, have you considered just running some 12B or 20B models instead of big ones? If you want to run a model locally, you can use HuggingFace and run a 12B model on about 12GB of VRAM with decent performance. Something as simple as a 3080 card would be fine. Anything relatively modern with 16GB RAM is going to run plenty of models just fine. The V100s are actually dated now.
 
If you want to mess around with AI cheaply then a Strix Halo box is going to be your cheapest option. Its on par/slightly faster than a DGX Spark for inferencing, gets somewhat caned for diffusion (but ROCm is getting there) and costs half the price of the Spark.

Its also a 16 core Zen5 machine with 128GB memory and reasonable I/O which only uses 120W. Mine gets used for a lot more than AI and will still be useful long after its obsolete for AI.

OP appears to be buying into obsolescence and will have a staggeringly high leccy bill to boot....
 
Thanks for the few positive comments and also the mostly negative ones.

I appreciate when people use language such as 'nonsense' in such a throwaway manner, helps me to filter out who is worth sharing and engaging with, and who is not :)

I want to play around with 'proper' pc hardware from older servers. When I learnt how to work on vehicles I done so on my 1980's 30 year old first car, the principles and the mechanics of the combustion engine was no different to me 2nd car which was a 20 year newer petrol engine.

Buying a load of ex enterprise hardware far less than the cost of one 2nd hand 4090 by itself to as a basis to learn and apply to he basic theory is worth £1000 to me. That's all there is it it.

If you want to help me along the way as I problem solve, or have useful suggestions or alternatives that fit with my overall ethos of learning, please contribute.

Otherwise, I'll get more wisdom from talking to the dog.

If anyone thinks this project is solely about 'AI', which is a daft name for large language models I might add, they've completely misunderstood the broader pictures.

I'll drop the crux of my total project goal here just to fan the flames even more...I'd like to see how much compute it is possible to run using renewable energy sources.
 
I want to play around with 'proper' pc hardware from older servers. When I learnt how to work on vehicles I done so on my 1980's 30 year old first car, the principles and the mechanics of the combustion engine was no different to me 2nd car which was a 20 year newer petrol engine.

Buying a load of ex enterprise hardware far less than the cost of one 2nd hand 4090 by itself to as a basis to learn and apply to he basic theory is worth £1000 to me. That's all there is it it.

If you want to help me along the way as I problem solve, or have useful suggestions or alternatives that fit with my overall ethos of learning, please contribute.

Otherwise, I'll get more wisdom from talking to the dog.
This is reasonable logic for mechanics and automotive but isn't something that translates well into enterprise-grade computing, unfortunately. The rapid pace of tech (especially the field of AI) makes things obsolete/EoL/unsupported very quickly. For example, the V100 is old and no longer supported with NVIDIA's latest drivers. You will need to use legacy drivers and this could cause weird behaviours when trying to use modern OSes, models, and supporting software.

You can play around with LLMs using a basic desktop PC. Using enterprise-grade hardware isn't going to change your experience. If you just want to play around with the HW, then, by all means, go for it. You can still run a modern Linux distro and go crazy.

I don't think anyone that has responded is trying to **** on your parade; they want you to make the best use of your money, time, and experience.
 
This is reasonable logic for mechanics and automotive but isn't something that translates well into enterprise-grade computing, unfortunately. The rapid pace of tech (especially the field of AI) makes things obsolete/EoL/unsupported very quickly. For example, the V100 is old and no longer supported with NVIDIA's latest drivers.
Its more a case of everything coming from CUDA13 toolkit (ie models) isn't going to run on the V100 rather than drivers. That and the fact the hardware isn't going to work with diffusion models created from this point onwards. Anyway the OP was clear enough so let him get on with it.

As an aside I find it interesting that NVidia throw their enterprise customers under the bus with s/w while continuing to support consumers and AMD do the polar opposite. We digress.
 
- 1x 5800x3d Am4 based system, that doesn't really enter this equation as it is my daily driver, however I have trying to figure out if I could use the VRAM of the V100's and the single core CPU performance of the 5800x3d for use in 3d software applications. I suppose, what do you call multiple computers working together?

Buying a load of ex enterprise hardware far less than the cost of one 2nd hand 4090 by itself to as a basis to learn and apply to he basic theory is worth £1000 to me. That's all there is it it.
Honestly just dropping a decent modern graphics card into your 5800X3D based system would seem to achieve the aim of running LLMs, as well as allowing you to use CUDA to accelerate your 3D Modelling.

A standard Geforce will also still have resell value in a years time (for gamers), when you want to upgrade to a better GPU to keep up with the requirements of newer LLMs.


Using all your other machines/thin clients etc for the rest of your projects is a sound idea.
 
Thanks for this, I'll take a look into the NVlink and DMI between cards you mentioned (I don't know what either are currently).

My current viewpoint is that running 4x, and potentially 8x, V100's on the one board should be possible. It that not the case?

What is an Nvidia certified system?

So with a c612 chipset you have a total of 80 gen3 PCIE lanes, but these are split between CPUs 40 each and these have supply everything. In essence what you have is a pair of system liked by the DMI and all communication is over DMI which is painfully bad.

In performance terms linking two cards over 16 lanes of PCIE3 is like taking a trip from Edinburgh to Glasgow via the M25. Linking more than two cards is like doing the same trip via Paris.
 
Its more a case of everything coming from CUDA13 toolkit (ie models) isn't going to run on the V100 rather than drivers. That and the fact the hardware isn't going to work with diffusion models created from this point onwards. Anyway the OP was clear enough so let him get on with it.

As an aside I find it interesting that NVidia throw their enterprise customers under the bus with s/w while continuing to support consumers and AMD do the polar opposite. We digress.

There are numerous hardware and software nightmares to deal with, not to mention power and licensing headaches.

It’s upsetting just thinking about TBH because even if you did get this working somehow, the performance at best from a Volta card is maybe 12 tflops.
 
Last edited:
There numerous hardware and software nightmares to deal with, not to mention power and licensing headaches.

It’s upsetting just thinking an out TBH because even if you did get this working somehow, the performance at best from a Volta card is maybe 12 tflops.
Indeed.

Its 6 year old AI tech. Its not designed for anything else and the depreciation cycle on NEW rack-based AI cards is 36-48 months with replacement at an optimistic 60 months (5 years).

Anyway the OP has bought it now & seems determined to plough on. If nothing else it'll be a rather expensive space heater with some inferencing capability.
 
Last edited:
@Vestas

Could you please let me know what the combined memory bandwidth of a 128GB AMD Strix holo is, it sounds absolutely fantastic!
 
So with a c612 chipset you have a total of 80 gen3 PCIE lanes, but these are split between CPUs 40 each and these have supply everything. In essence what you have is a pair of system liked by the DMI and all communication is over DMI which is painfully bad.

In performance terms linking two cards over 16 lanes of PCIE3 is like taking a trip from Edinburgh to Glasgow via the M25. Linking more than two cards is like doing the same trip via Paris.

DMI is completely irrelevant (until you are trying to link pools over ethernet) when talking about utilising V100 cards in a pooled configuration utilising SXM->PCI-e for GPUs off the direct CPU link it will utilise QPI - approx. 30GB/s each way.

Though you are right it is a poor substitution for a proper nvlink setup, but not as bad as you are making out.
 
Last edited:
DMI is completely irrelevant (until you are trying to link pools over ethernet) when talking about utilising V100 cards in a pooled configuration utilising SXM->PCI-e for GPUs off the direct CPU link it will utilise QPI - approx. 30GB/s each way.

Though you are right it is a poor substitution for a proper nvlink setup, but not as bad as you are making out.

It’s interesting how you can contradict yourself within the same paragraph. With these old Xeons you can never reach 30gb each way with this configuration. The maximum possible (theoretical) throughput would be 15 gig/s running 8x PCIE gen3 at 8 lanes.

No it’s not as bad as I’m making out, it’s actually much worse…
 
Thanks for this, I'll take a look into the NVlink and DMI between cards you mentioned (I don't know what either are currently).

My current viewpoint is that running 4x, and potentially 8x, V100's on the one board should be possible. It that not the case?

What is an Nvidia certified system?

Think I'm right that you have 7 slots on that board - so running 8x cards would require messing about with SXM boards with integrated NVLink Lite - I think - a bit beyond my working knowledge when it comes to non-standard setups for NVLink.

When you have more than 4 GPUs connected some of those links will drop to 8x speed rather than full x16.
 
It’s interesting how you can contradict yourself within the same paragraph. With these old Xeons you can never reach 30gb each way with this configuration. The maximum possible (theoretical) throughput would be 15 gig/s running 8x PCIE gen3 at 8 lanes.

No it’s not as bad as I’m making out, it’s actually much worse…

EDIT: I see what you are claiming though coming at it a weird way around (you are mixing up duplex speeds) - but that is still way better than the DMI limitation you were claiming.
 
Last edited:
You’re still not understanding the problem.


C612.png


Where is the problem? (I may be wrong but off the top of my head NVLink for the kind of setup OP would be running is 25GB/s per card so it isn't terrible but still a long way from a proper NVLink setup).

EDIT: For clarity I'm talking about the bit where you said:

In performance terms linking two cards over 16 lanes of PCIE3 is like taking a trip from Edinburgh to Glasgow via the M25. Linking more than two cards is like doing the same trip via Paris.

Sure if linking more than two pairs like you originally mentioned then without NVLink you are in a world of pain but for 2 cards you still have a fairly direct performant setup and even 4 cards isn't that bad.

Though I'm pretty sure you got everything wrong here:

So with a c612 chipset you have a total of 80 gen3 PCIE lanes, but these are split between CPUs 40 each and these have supply everything. In essence what you have is a pair of system liked by the DMI and all communication is over DMI which is painfully bad.

And thought the 2 CPUs were limited to DMI's 2GB/s interconnect.
 
Last edited:
I've got no idea what you're both on about, but I'm enjoying it nonetheless while I wait for @Vestas to get back to me.

I'm not sure what Jigger is on about - they got it sort of right, though exaggerated the issues, originally then just started muddling terms in.

You don't touch DMI and its limitations - if you want to scale beyond 4 cards you need either NVLink which I believe it limited to 6 GPUs on this setup or some funky NVLink Lite implementation which is outside of my knowledge. If you are building multiple systems then that is another story entirely.
 
Thanks Rroff, it's becoming clearer by the day, but it still appears mumbo jumbo at first glance.

I've spent most of this evening just salivating over the bare V100's, things of absolute beauty!!!

I'd like to get up and running with two gpu's so that I can benchmark and optimise an appropriate cooling solution, run them both at the same time with different prototypes sort of thing. I think I may have to source an OEM heatsink to be fair, just to reverse engineer the dimensions of the mating surfaces, as it looks to me like the mirrored chip in the middle is slightly recessed. I'll be spending a fair bit of time designing a clamping solution that's for sure.
 
Back
Top Bottom