Looking to develop my own AI model based on something like the Falcon 7B as I reckon the Falcon 40B too hard to do on "consumer hardware". Budget isn't really an issue but also no point spending a fortune if not needed.
Currently I have a
5950x
32gb ddr4
and I saw a pc for sale with:
i9 13900k
64gb ddr5 6000mhz (with option to have 128gb ddr5 for 250chf extra)
GPU wise I currently have a RTX 3080 but reading around it seems like you need more than 10gb ram so was thinking a 4080 SUPER or for a bit extra is the 4090 better? Am I likely to see a material improvement with the i9 over the 5950x? Currently for the little I have done it chugs quite a bit, but I recognised I have very little ram anyway so not fair comparison
Appreciate any help or alternative build suggestions.
Models are dependant on memory bandwidth which is why GPU is so much faster than CPU.
Consumer desktop bandwidth sucks for any larger AI models, while you could load up 128GB Ram, it would likely need hours to generate a long response and an iceage to do any training.
A token is a few characters and each token is calculated against each parameter of the model so time quickly become inpractical.
Training generally also benefits significantly from Tensor cores hence the popularity of recent Nvidia hardware.
If you want single card without breaking the budget then a 3090 is your best bet, high VRAM, good bandwith and tensor cores.
Personally I would only consider training small models locally, anything more significant I would throw onto runpod or similar. You can rent 3090's there for $0.40/hr, H100 80GB for ~ $3.50/hr or something inbetween.
Training is significantly slower across a PCI-E bus, especially if you don't have high end hardware with V4.0 x16. Something like an RTX 5000 (Quadro) with 48GB is under $1 /hr.
The higher end servers have the Nvidia high speed interconnect so can perform well with multiple GPU's.
If you are just running the model (inferencing), huggingface has lots of quantised versions, a 20b model + 4k context fits in a 16GB card.
You can use mutiple cards effectively but due to the way layers load in blocks the blocks don't fit exactly in the cards so you can waste ~1GB per card hence 24GB > 2 x 12GB > 4x6GB
You don't need a lot of compute for mid size models 30b / 7x8b etc. these run at a good chat style pace on 3060 / 4060 level hardware for a single user.
Say 10s or so prompt processing with 4k context and the reply is a little faster than I can read.
Once option may be an x570 board, the second slot is usally PCI-E 4.0 x4 so is good enough for inference and then throw in a 4060 16GB.
Need to make sure the slots are well spaced. The ASRock X570 Pro 4.0 looks good for this as the top slot is very close to the CPU, you then have 4 slots for the main card before the second GPU slot is in the 5th slot, still with some space before the edge of the board.
Not sure how well the VRM's would like your CPU though. Not may other boards than ASrock with this layout sadly.
Edit: I was actually wondering the other day why there isn't an AI section but I guess it's still niche and the number of waifu related posts might keep the mods busy.