AI PC

Orangeade · 4 Feb 2024 at 17:24

Looking to develop my own AI model based on something like the Falcon 7B as I reckon the Falcon 40B too hard to do on "consumer hardware". Budget isn't really an issue but also no point spending a fortune if not needed.

Currently I have a

5950x
32gb ddr4

and I saw a pc for sale with:

i9 13900k
64gb ddr5 6000mhz (with option to have 128gb ddr5 for 250chf extra)

GPU wise I currently have a RTX 3080 but reading around it seems like you need more than 10gb ram so was thinking a 4080 SUPER or for a bit extra is the 4090 better? Am I likely to see a material improvement with the i9 over the 5950x? Currently for the little I have done it chugs quite a bit, but I recognised I have very little ram anyway so not fair comparison

Appreciate any help or alternative build suggestions.

Ch3m1c4L · 5 Feb 2024 at 14:42

7b states at least 16gb ram to run well, so should already be OK on your 32gb ram if you aren't using it on other stuff at same time. You can likely put 64 or 128gb in your am4 board.

40b says 80-100gb so is within consumer hardware reach.

You're going to want the highest gpu you can really, for infetencing faster, and more vram to save on swapping data. I'm not exactly sure on what these ones like on the vram side though, I'd either get a 3090 for more vram or 4080 super for more compute, or 4090 if budget really doesn't matter and it's for business rather than play.

Generally speaking, it shouldn't really be running on the cpu much, but I guess some models do? Is yours sat at 100% on any cores during use, and is your gpu configured correctly for the model? If you do need cpu change, You'll probably want something with some AI processing in hardware, which I think is on intel 13th series, but might be increased in 14th, but I'm unsure on this. I know the mobile i5 1340p does but I haven't confirmed which desktop generation this is similar too. I do not know for amd. Whether this helps in each individual model is based on the software, so don't count on it "just working". It's all very new.

decto · 5 Feb 2024 at 21:34

Orangeade said:
Looking to develop my own AI model based on something like the Falcon 7B as I reckon the Falcon 40B too hard to do on "consumer hardware". Budget isn't really an issue but also no point spending a fortune if not needed.

Currently I have a

5950x
32gb ddr4

and I saw a pc for sale with:

i9 13900k
64gb ddr5 6000mhz (with option to have 128gb ddr5 for 250chf extra)

GPU wise I currently have a RTX 3080 but reading around it seems like you need more than 10gb ram so was thinking a 4080 SUPER or for a bit extra is the 4090 better? Am I likely to see a material improvement with the i9 over the 5950x? Currently for the little I have done it chugs quite a bit, but I recognised I have very little ram anyway so not fair comparison

Appreciate any help or alternative build suggestions.

Models are dependant on memory bandwidth which is why GPU is so much faster than CPU.
Consumer desktop bandwidth sucks for any larger AI models, while you could load up 128GB Ram, it would likely need hours to generate a long response and an iceage to do any training.
A token is a few characters and each token is calculated against each parameter of the model so time quickly become inpractical.
Training generally also benefits significantly from Tensor cores hence the popularity of recent Nvidia hardware.

If you want single card without breaking the budget then a 3090 is your best bet, high VRAM, good bandwith and tensor cores.

Personally I would only consider training small models locally, anything more significant I would throw onto runpod or similar. You can rent 3090's there for $0.40/hr, H100 80GB for ~ $3.50/hr or something inbetween.
Training is significantly slower across a PCI-E bus, especially if you don't have high end hardware with V4.0 x16. Something like an RTX 5000 (Quadro) with 48GB is under $1 /hr.
The higher end servers have the Nvidia high speed interconnect so can perform well with multiple GPU's.

If you are just running the model (inferencing), huggingface has lots of quantised versions, a 20b model + 4k context fits in a 16GB card.
You can use mutiple cards effectively but due to the way layers load in blocks the blocks don't fit exactly in the cards so you can waste ~1GB per card hence 24GB > 2 x 12GB > 4x6GB
You don't need a lot of compute for mid size models 30b / 7x8b etc. these run at a good chat style pace on 3060 / 4060 level hardware for a single user.
Say 10s or so prompt processing with 4k context and the reply is a little faster than I can read.

Once option may be an x570 board, the second slot is usally PCI-E 4.0 x4 so is good enough for inference and then throw in a 4060 16GB.
Need to make sure the slots are well spaced. The ASRock X570 Pro 4.0 looks good for this as the top slot is very close to the CPU, you then have 4 slots for the main card before the second GPU slot is in the 5th slot, still with some space before the edge of the board.
Not sure how well the VRM's would like your CPU though. Not may other boards than ASrock with this layout sadly.

Edit: I was actually wondering the other day why there isn't an AI section but I guess it's still niche and the number of waifu related posts might keep the mods busy.

katie279 · 10 Feb 2024 at 12:39

Funnily enough, I'm posting a video on my new pc tomorrow that touches on this - I was blown away by some of the AI performance of the 4090 as a consumer unit

Personally I'd go for the 4090 every time if going AI heavy - the extra tensor cores/cudas etc are quite significant when compared with the 4089 super.

As mentioned a couple of 3090 24gb models could be an alternative if you can find...

srekal34 · 12 Feb 2024 at 20:33

4090 is much better than 4080 for falcon.