Home server AI stack anyone?

Associate
Joined
12 Apr 2016
Posts
37
Hi All,

Skimmed through a few pages of threads and didn't spot any home server AI stack builds. Has anyone here built himself a home server for own AI stack?
I'm interested in assembling a budget rat rod server, so would really appreciate suggestions on minimum spec for minimum budget to keep it viable. The first build would be purely to test it out before I commit some serious spondoolicks to it. Ideally, start small and upgrade as I go.
Will most likely put ubuntu server on it.

From initial reading, GPU is key while I can probably save a few quid on CPU and motherboard?

Any thoughts or suggestions welcome.

Thanks!
 
Personally I haven't gone beyond using a Dell QBM1250 with Intel 265 for AI related tasks (as an offload from my main systems) - for any serious local LLM use, etc. you quickly need video cards with a LOT of VRAM.
 
Last edited:
Anyway, my current budget home PC build is happily running on B550 Ryzen 5500G,.

Am I too boring to consider a build on B450 with Ryzen 5500 or 5600x and RTX 3060 12GB?
What form factor to pick? mini-ITX in metalfish T60 perhaps? Would there be an appropriately-sized server PSU for this?

Thanks.
 
Last edited:
I'm a bit shocked by RAM prices. I picked up a pair of new 16gb DDR4 corsairs for about £50 less than 2 years ago. 32gb RAM seems to cost £250 now. Wow. Should have invested my life savings into ram sticks in 2024.

Does anyone have build plans for a budget-friendly time machine? Happy to buy used parts and Chinese knock-offs.

Looks like I'll just re-purpose my current box into an ATX case to run some AI crunching on a new GPU.
 
12GB on a 3060 isn't enough for anything other than a tiny model with a minimal context window. That might be OK for your purposes but it'd have to be VERY specific tasks with models suitable for that - you won't get much choice.

Anything under 32GB "VRAM" isn't going to be much use but I suppose YMMV...
 
I've settled on a 16gb 5060 card, as I'll need a gpu for a kid's PC soon anyway.

At the moment, it's very specific text-only jobs, so I suspect 16gb will get me started OK.

Any suggestions on what to look out for in 24gb or 32gb segment? There seem to be a few older datacenter cards that need liquid cooling. Never done it before - how hard (& expensive) is it to fit a radiator?
 
There seem to be a few older datacenter cards that need liquid cooling. Never done it before - how hard (& expensive) is it to fit a radiator?

Some of these need a little care taken with heatsink/block mounting as they are direct die and too much torque will crack them. Generally not an issue with 2 finger tightening technique but they are intended for use with a torque screwdriver/wrench.
 
Since I have bought a water cooled 3090 for my video editing rig my air cooled 3090 hanging around doing nothing. I am building a spare PC to run LLM and run my own AI, all new gear (old gear) no idea. Currently I do a lot of research and youtube admin/planning with chatGPT paying a monthly fee. Hoping to run Open Claw agents pair with both local LLM and cloud to expand my part time business. I like the idea of having a AI office with a few agent employee.
 
I'm a bit shocked by RAM prices. I picked up a pair of new 16gb DDR4 corsairs for about £50 less than 2 years ago. 32gb RAM seems to cost £250 now. Wow. Should have invested my life savings into ram sticks in 2024.

Does anyone have build plans for a budget-friendly time machine? Happy to buy used parts and Chinese knock-offs.

Looks like I'll just re-purpose my current box into an ATX case to run some AI crunching on a new GPU.
This isn't really something that you do "on a budget". VRAM is your main limiting resource. It highly depends on what you want to be running locally, LLM's don't take *THAT* much compute, but you wont run anything particularly great on 16gb or less VRAM, Sub 10b parameter models and ccontrext limited or offloaded to RAM, so often slower tokens/s cpu+RAM is favoured instead. You will be likely looking at "edge models" q2 or q4 models likely.

Look through ollama's list of models at what you want to run, and how much space they use.

For generating images, you dont need so much vram, compute is needed here, unless you want to generate high res or simultaneous (not queued) images. For Gen video you will need compute and VRAM.

Intel CPU with NPU + Arc iGPU
AMD Ryzen AI chip/Strix Halo
16GB consumer GPU
3090/4090
5090/Intel Pro 32gb GPU (Intel cards are cheapest way to get large VRAM but YMMV on what you can run, and its more aimed at LLM than image/video afaik)
Multi GPU or nVidia Pro RTX cards.
 
Before ramageddon, budget setups were definitely a thing. Tesla P40s were dirt cheap for so long, then went up quite a lot, but on the way back down (for a reason). Just swapped mine out for a 22gb 2080ti.

I miss the days of 25 quid 64gb ECC DDR4 dimms. :( Would quite like 4 more for an 8 channel Epyc setup.
 
Last edited:
Back
Top Bottom