Took the plunge - First Home Server

Changing the tune. Thanks everyone has read and responded, warts and all. Everyone has encouraged me to go and read things I didn’t know existed last week.

I think part of why I might have rustled a few feathers is that I’ve just done a poor job of separating what I want to learn long term, from what I expect to achieve immediately.

To reset expectations a bit for anyone who is still interested:

Short Term.

- Accumulate enough hardware to start designing an enclosure
- Start a build thread
- Source one OEM heatsink, reverse engineer how it mounts to the V100, and build some prototypes
- Learn fluid and thermal dynamics to an A-Level understanding, and then put a wanted ad up in the members market for old heatsinks
- 3D Print a better testing bench for my worktop
- Check the Motherboard, CPU and Ram posts and is stable with my 4070

Mid term.

- 2× V100's
- 1x Xeon
- 4x DIMMs
- No attempts at VRAM pooling, NVLink, or scaling beyond what the board supports by default.
- Get a stable platform I can benchmark, stress, cool, and understand.

Longer term.

- Completely wide open, other than just trying to learn more about computers and how they work and what they can do beyond play games.

To make it clear, I fully accept;
- V100s, Xeons and DDR4 are legacy hardware in AI terms
- Support/drivers/new models will lock me out sooner, rather than later
-This will never compete with hosted and subscription based models

Where I would appreciate advice and help:

Known HBAs that will be suitable for Unraid with SAS-12Gbps drives
Practical cooling approaches from people who have used enterprise gpus outside of a rack chassis
Any known quirks with the Asus Z10PE-D8 WS, maybe something I should watch out for that isn't obvious in the manual.
BIOS settings worth changing (or not changing) for initial system stability

I mean this in as genuine a way as I can possibly convey; This isn’t about trying to build something that is going to be better than anything I can pay for, it won't be. It’s about understanding hardware, software, and an entire area of computers I have no previous experience of. Rather than moping about because I feel that i've missed the boat with regards to ever having a career in engineering, design or creativity. I'm just going to have a go at home, and see how I get on learning something new that interests me.

If the bloody thing ends up being loud, power hungry and just too old to do anything useful with, so be it – but I’d rather learn that from building it and it failing, than not just trying at all and going to the pub.

And for anyone who is still reading, I will start a build thread once parts are 'on the bench' :)
 
HBAs:
With HBAs you want to make sure they are preflashed/configured in IT mode.. People advertise them as compatible with TrueNAS/Unraid..
e.,g.https://www.ebay.co.uk/itm/404487396328

Server GPUs
Simple, they are designed to be cooled from directed airflow mid chassis, you can buy those in various types, either blower or fans on a shroud cheaply, albeit, bloody noisy with fans at full tilt.

You can buy waterblocks for water cooling the V100, so you could do that, but they aren't cheap (~£140 each), however, the system would be very cool/quiet.. You can pick up cheap watercooling stuff on the for sale section here, it's where I get a lot of stuff.
 
Last edited:
Changing the tune. Thanks everyone has read and responded, warts and all. Everyone has encouraged me to go and read things I didn’t know existed last week.

I think part of why I might have rustled a few feathers is that I’ve just done a poor job of separating what I want to learn long term, from what I expect to achieve immediately.

To reset expectations a bit for anyone who is still interested:

Short Term.

- Accumulate enough hardware to start designing an enclosure
- Start a build thread
- Source one OEM heatsink, reverse engineer how it mounts to the V100, and build some prototypes
- Learn fluid and thermal dynamics to an A-Level understanding, and then put a wanted ad up in the members market for old heatsinks
- 3D Print a better testing bench for my worktop
- Check the Motherboard, CPU and Ram posts and is stable with my 4070

Mid term.

- 2× V100's
- 1x Xeon
- 4x DIMMs
- No attempts at VRAM pooling, NVLink, or scaling beyond what the board supports by default.
- Get a stable platform I can benchmark, stress, cool, and understand.

Longer term.

- Completely wide open, other than just trying to learn more about computers and how they work and what they can do beyond play games.

To make it clear, I fully accept;
- V100s, Xeons and DDR4 are legacy hardware in AI terms
- Support/drivers/new models will lock me out sooner, rather than later
-This will never compete with hosted and subscription based models

Where I would appreciate advice and help:

Known HBAs that will be suitable for Unraid with SAS-12Gbps drives
Practical cooling approaches from people who have used enterprise gpus outside of a rack chassis
Any known quirks with the Asus Z10PE-D8 WS, maybe something I should watch out for that isn't obvious in the manual.
BIOS settings worth changing (or not changing) for initial system stability

I mean this in as genuine a way as I can possibly convey; This isn’t about trying to build something that is going to be better than anything I can pay for, it won't be. It’s about understanding hardware, software, and an entire area of computers I have no previous experience of. Rather than moping about because I feel that i've missed the boat with regards to ever having a career in engineering, design or creativity. I'm just going to have a go at home, and see how I get on learning something new that interests me.

If the bloody thing ends up being loud, power hungry and just too old to do anything useful with, so be it – but I’d rather learn that from building it and it failing, than not just trying at all and going to the pub.

And for anyone who is still reading, I will start a build thread once parts are 'on the bench' :)

You need to be very careful with SMX chip as it’s a very delicate installation process. If you stick to a reasonable config you should get a usable system. By that, I mean something like 4 VMs with a GPU each. Once you start to share resources across the platform is where you’ll hit roadblocks and expose the technology flaws in the platform. It would be a great learning experience but, I’d avoid that as its a pretty futile task and for AI you efforts would certainly be much better spent working with ROCm and OneAPI.

Ps ignore much of what Rroff says, he’s one of those Google expert types with no experience.
 
Last edited:
Could you please let me know what the combined memory bandwidth of a 128GB AMD Strix holo is, it sounds absolutely fantastic!
Its not an entirely straighforward answer I'm afraid :)

The memory is quad-channel LPDDR5-8000 so effectively that gives a theoretical memory bandwidth of 4 x 64GB = 256GB/sec.

However its unified memory and hence dependent on the memory connections to the infinity fabric and the MALL (infinity cache). These differ between CPU and graphics on Strix Halo.

The CPU is essentially a 9950X (dual-CCD) without 32MB MALL (infinity cache) so its actually a little slower than the desktop 9950X CPU. On Strix Halo the 8060S compute units have the 32MB MALL dedicated to them, more on that later.

As a consequence the CPU on Strix Halo is limited to 32 bytes/cycle for writing to memory so its a bit slower writing than reading. Just like desktop CPUs each CCD is effectively limited to 64GB/sec bandwidth so the upshot of all of this is the CPU (both CCDs) will max out on read/modify/add at around 175GB/sec. 120GB/sec read, 80GB/sec write.

The 8060S graphics/compute effectively gets the full bandwidth and all the MALL (infinity cache). That means in practice you have around 220-240GB/sec bandwidth for the compute units although its very dependent on workload - anything involving the CPU (other than scripts) will pull the write bandwidth down. memtest_vulkan is about 220/198GB/sec. It's quite amusing watching some AI workloads on Strix Halo - CPU running at 1.45GHz (idling) while the GPU is at 2.9GHz with compute at 98% and 100+GB memory in use.

There is also the XDNA2 NPU attached to the infinity fabric which I think has 64GB/sec read/write bandwidth although that's a guess. Its pretty hard to find a way to check as its undocumented AFAIK.

There's some better explanations over at chipsandcheese :



Basically its a 4.5L box - similar size to the DGX Spark (quelle surprise!) - which pulls a max of about 140W and is just about on par with the DGX Spark for inferencing, not so much with diffusion but its getting there with ROCm. Two Gen4 SSD slots and varying I/O depending on manufacturer. Plays games at around 4060M/4070M/PS5 standard - ie fine up to 1440p.

Edit - in terms of AI model sizes and memory use, it depends. For inferencing something like gpt-oss:120b will use 64GB of compute (graphics) memory and another 14/15GB of CPU memory is in use. 78GB committed. Diffusion is very variable depending on model/image etc but I regularly have over 100GB committed and on one occasion 122GB which wasn't great in terms of speed.

Edit2 - since there's a load of Strix Halo stuff here anyway I thought I'd link this chipsandcheese video for anyone techie enough as newer ROCm does the s/w stuff he talks about regarding memory/MALL allocation between CU and NPU :

 
Last edited:
- 2× V100's

This is the problem for me. You just shouldn't use these because not only are they not going to do what you hope, they're going to cost you more money.

The rest I can quite understand as I did much the same myself back in the 00's - HP 1U DL-145 , dual opterons, howled like a banshee. I built a box for it in the garage, attached all sorts to it and retired it 8 years later for a box that did everything the old box did and used a quarter of the power to do it. Right enough mine wasn't 10 years out of date off the bat ;)

On anything related to AI, last year is last decade so to speak.
 
I don't know how much of the NVLINK discussion is relevant.
OP said they are using SMX to PCI-E adaptors and I'm not aware these cheap adaptors have any NVLINK connectivity since NVLINK is a specialist high bandwidth bus.
You can buy sub boards designed to hold multiple SMX GPU with integrated NVLINK support which then connect via PCI-E but these are rare, expensive and many are vendor locked.

As I read it, OP will connect these cards over PCI-E.

Multi GPU over PCI-E is usable for inference, it takes around 1-2 minuted to load a 70b 4.5bpm Llama into 48GB with some context but that is a one time effect.
PCI-E bandwidth doesn't make a lot of difference during inference, only the context is sent to each card and 2 cards should spit out 10/15 Tokens a second with a 20-30Gb model.
That's about reading speed for me so is usable for chat responses.

Jump up to 70b and 3 cards and more like 6-8 tokens a second which is slow but for simple chat bot interactions.

Not so good for coding etc. as you are limited how much 'code' you can upload in the small context and if you want 20k tokens of code back, a small project... that will be around half an hour to an hour depending on the model and it gets worse the large the model and the more cards you add.

PCI-E is no real use for training, it's just to way slow for anything that doesn't fit on one card, but then you can just rent some cloud compute.

If you are going to play with the SMX, I'd still look at proper coolers. From the popular China site, SMX HSF's are around £50 and you need to add high pressure fans... not powered of the motherboard fan header. Water blocks are £35 but you need pump and radiator and radiator fans. The GPU is a precision part, unless you have CAD drawings and a high precision mill that can work suitable materials you aren't going to make anything better for the cost. Those are bare cores, uneven pressure, you crack it, not enough pressure, hot spots, partly not covered, it burns up. The die itself is 815mm^2 compared to a 5090 at 750mm^2 so it's huge. You'll pay close to that for a bare cooler of something else that doesn't fit and has no chance of effectively cooling it. Even most CPU tower coolers with heat pipes will likely be inadequate at 300W, assuming you have some way of properly cooling all the voltage regulations too.
 
I don't know how much of the NVLINK discussion is relevant.
OP said they are using SMX to PCI-E adaptors and I'm not aware these cheap adaptors have any NVLINK connectivity since NVLINK is a specialist high bandwidth bus.

There is no NVLink functionality using those adapters. The main thing I was trying to clear up is that the DMI bus, and the inherent limitations which would if it was the case make them cripplingly slow, is not relevant to using multiples of these cards in this context contrary to the claim of one poster. As you said there are a lot of considerations to using the cards like this for things like LLMs which needs working with.

There are some "NVLink Lite" multi card boards available fairly cheap these days which AFAIK aren't vendor locked but that is beyond my experience. (I might be wrong but I don't believe they support any kind of automatic resource pooling and an LLM would need to be programmed to take advantage of the benefits of NVLink for them to provide a substantial benefit).

PS it is SXM not that it is a big deal but I notice people keep saying SMX in this thread.
 
Last edited:
As an aside in regard to the delicacy of the SXM mounting process it is only relevant to the heat sink really unless you go at it ham fisted and while there is a certain amount of art/experience to knowing when to stop if you use finger grip rather than palm most people can't exceed the maximum torque required to damage the die (palm grip can exceed the maximum torque by around 3x).
 
Back
Top Bottom