Intel 40 or More PCIe Lanes After X299 ?

Associate
Joined
22 Dec 2009
Posts
1,365
Location
Upper Skurt
Hi,

Would anyone please be able to tell me if there are any Intel boards with 40 or more PCIe lanes after X299 when fitted with an appropriate CPU ?

I run a few cuda server machines with multiple 3080Ti and 3090 GPU's and after quite a lot of testing various boards the X99 platform with a 40 lane CPU runs fastest for the apps I use.
For some reason, possibly CPU architecture I could never get X299 to match X99 in terms of app performance.

As both X99 and X299 are getting a bit dated I wondered if there were any Intel boards after X299 that when fitted with an appropriate CPU provided 40 or more PCIe lanes?
 
The Creator boards (Asus) are usually the highest-end PCI-E config and that has two full length PCI-E 5.0 slots with 8 lanes each and one PCI-E 4.0 with 4 lanes.

If you want better than that, I think you'd have to go Threadripper or Xeon-W.
 
ASUS PRO WS W790E-SAGE Motherboard for Intel Sapphire Rapids. 6 to 56 P cores and lots of PCIe gen 5 lanes and quad channel registered DDR5. The board is expensive but if you want >= 40 PCIe lanes and slot.

 
Last edited:
Guys,

Thank you both for your replies.

The Pro WS W790-SAGE looks awesome, almost scared to check the price... Braved up and its £1000+
However, it might offer the opportunity to load the board with more GPU's running at x16 speed than I can with X99 (max of 2)

I would need to check it out but theoretically it looks easily good for 4 GPU at x16, possibly 6 at x16 with the appropriate CPU.
This could halve or third the number of existing X99 machines I use.
 
Last edited:
Guys,

Thank you both for your replies.

The Pro WS W790-SAGE looks awesome, almost scared to check the price... Braved up and its £1000+
However, it might offer the opportunity to load the board with more GPU's running at x16 speed than I can with X99 (max of 2)

I would need to check it out but theoretically it looks easily good for 4 GPU at x16, possibly 6 at x16 with the appropriate CPU.
This could halve or third the number of existing X99 machines I use.
 


Thank you for the additional links.

The lower end CPU's in the W-3400 and W-2400 ranges are not that badly priced and provide the same number of PCIe lanes as the higher end CPU within their respective ranges.
The W-2400 provides 64 PCIe5 lanes and I assume that some of these would be used by an M2 nvme or standard SSD? This would block running 4 x GPU at 16x speed.

I think the base model W-3400 might be my best option with 115 PCIe5 lanes should I decide to migrate to this platform.
It's going to need some meaty PSU grunt to run 4 x 3090's in the same board.
 
Thank you for the additional links.

The lower end CPU's in the W-3400 and W-2400 ranges are not that badly priced and provide the same number of PCIe lanes as the higher end CPU within their respective ranges.
The W-2400 provides 64 PCIe5 lanes and I assume that some of these would be used by an M2 nvme or standard SSD? This would block running 4 x GPU at 16x speed.

I think the base model W-3400 might be my best option with 115 PCIe5 lanes should I decide to migrate to this platform.
It's going to need some meaty PSU grunt to run 4 x 3090's in the same board.
You could also look at the ASUS Pro WS W790-ACE, it should be cheaper than the SAGE as it uses less lanes, think it has 5 * 16x slots and slot 4 & 5 are shared.
 
Last edited:
You could also look at the ASUS Pro WS W790-ACE, it should be cheaper than the SAGE as it uses less lanes, think it has 5 * 16x slots and slot 4 & 5 are shared.


You are correct, a quick check reveals the ACE has 5 x 16x slots and is approximately £350 cheaper than the SAGE.
Looks to be a contender as I would probably not wish to load more that 4 3090/3080Ti onto a board.

The other possible advantage of moving up to the "790" platform is that I am bottlenecked at PCIe Gen3 on X99 with PCIe Gen4 cards.
There might be speed advantages not only from the PCIe generation upgrade but also from the much faster RAM speeds available?
 
The ACE slots are: 5 x PCIe 5.0 x16 slot(s) (supports x16, x16, x16, x0/x8, x16/x8 modes)

To me this looks like the last two slots are shared so if you use them both they run at 8x but you can use the last slot at 16x if not using the previous one. You will need registered DDR5 which cost a bit more but seem to be very stable unlike desktop DDR5. Registered DDR5 now supports XMP so you could get 6000+ speeds but don’t know if it’s worth the extra cost as its quad channel anyway so you would probably get over 100GB per second even with 4800 RAM,(my X99 got ~65GB per second). The X99 platform was very good, the W790 seems like the logical next step, the downside is it costs a lot more than X99 did. Also, make sure your case can fit the board in.
 
The ACE slots are: 5 x PCIe 5.0 x16 slot(s) (supports x16, x16, x16, x0/x8, x16/x8 modes)

To me this looks like the last two slots are shared so if you use them both they run at 8x but you can use the last slot at 16x if not using the previous one. You will need registered DDR5 which cost a bit more but seem to be very stable unlike desktop DDR5. Registered DDR5 now supports XMP so you could get 6000+ speeds but don’t know if it’s worth the extra cost as its quad channel anyway so you would probably get over 100GB per second even with 4800 RAM,(my X99 got ~65GB per second). The X99 platform was very good, the W790 seems like the logical next step, the downside is it costs a lot more than X99 did. Also, make sure your case can fit the board in.


I read it the same re the use of the ACE board PCIe slots, with slots 1, 2, 3 and 5 populated I should get all 4 running at x16.

The case size is not an issue, I am currently using second hand mining rig frames such as Veddha or similar with PCIe x16 extension cables.
They allow for better spacing on the GPU's and more air circulation, less heat build up from the GPU's.

I have some E-ATX boards in the mining rig frames already, an ASUS Rampage V Extreme (5930k) and an ASUS X99 E-WS (5960x), both great boards within their generation.
 
Last edited:
I read it the same re the use of the ACE board PCIe slots, with slots 1, 2, 3 and 5 populated I should get all 4 running at x16.

The case size is not an issue, I am currently using second hand mining rig frames such as Veddha or similar with PCIe x16 extension cables.
They allow for better spacing on the GPU's and more air circulation, less heat build up from the GPU's.

I have some E-ATX boards in the mining rig frames already, an ASUS Rampage V Extreme (5930k) and an ASUS X99 E-WS (5960x), both great boards within their generation.
You may be able to use PCIe bifurcation to increase the number of GPU's. 2 GPU's at 8x gen 4 per 16x slot (if you can find a PCIe 8x bifurcation adapter). Most GPU workloads are not PCIe link speed limited, if you don’t mind me asking, what type of workloads are you using them for?
 
You may be able to use PCIe bifurcation to increase the number of GPU's. 2 GPU's at 8x gen 4 per 16x slot (if you can find a PCIe 8x bifurcation adapter). Most GPU workloads are not PCIe link speed limited, if you don’t mind me asking, what type of workloads are you using them for?


The GPU's are used for their cuda cores in custom apps associated with cryptology (not mining).

From various tests over the past few years the more cuda cores on a GPU the better performance I get.
There are other contributing factors such as RAM speed, GPU clock speed and PCI lane speed. There is a definite performance hit when using x8 compared to x16. I measured this when using an i7-5820k (28 lanes) vs an i7-5930k (40 lanes) in the exact same board build with two 3090 GPU's. The i7-5820k setup was 8% to 10% down in performace compared to the i7-5930k, presumably due to the cards running at x16 and x8 (5820k) vs x16 and x16 (5930k)?
 
Last edited:
The GPU's are used for their cuda cores in custom apps associated with cryptology (not mining).

From various tests over the past few years the more cuda cores on a GPU the better performance I get.
There are other contributing factors such as RAM speed, GPU clock speed and PCI lane speed. There is a definite performance hit when using x8 compared to x16. I measured this when using an i7-5820k (28 lanes) vs an i7-5930k (40 lanes) in the exact same board build with two 3090 GPU's. The i7-5820k setup was 8% to 10% down in performace compared to the i7-5930k, presumably due to the cards running at x16 and x8 (5820k) vs x16 and x16 (5930k)?
I would use something to look at the GPU utilization, if its at 99/100%, then you’re probably good.
If its significantly below 99%, then you might get some improvement by batching the work you send to the GPU, its normally better to send big chunks compared to lots of small chunks over the PCIe bus. Obviously, I don’t know how you do things and I have never used cuda, I have only used OpenGL/Vulkan GLSL but it seems strange that you get such a drop from the PCIe bus. Anyway, good luck with the new platform if you chose to go for it.
 
Back
Top Bottom