Need Help Speccing a Server

Associate
Joined
2 Mar 2007
Posts
599
Hi I have been tasked with upgrading our current servers at work and was wondering if someone here could recommend something.

The server will be used to process calculations.
the Budget is £60k to £70k

Below I have attached our current Server which we will be needing to upgrade.
any advice would be greatly appreciated.

zlecCvk.png
 
Soldato
Joined
14 Apr 2014
Posts
2,586
Location
East Sussex
The above really needs to be answered first

E.G are the CPUs and all cores maxed out when the app is running, or is it the memory (or both) - is it disk IO limited at points
Also do you know the name and version of the aoftware? Specifically it would be useful to know it it's using AVX


Cheers
 
Soldato
Joined
5 Mar 2010
Posts
12,345
Also need to determine your level of support/service if it's used in a business environment.

I.e. If it's a mission critical production server, then you'll be wanting a very good support package that goes with it.
 
Associate
OP
Joined
2 Mar 2007
Posts
599
What are the bottlenecks with your current server?

E.G are the CPUs and all cores maxed out when the app is running, or is it the memory (or both) - is it disk IO limited at points
Also do you know the name and version of the aoftware? Specifically it would be useful to know it it's using AVX

Hi guys, So the Bottlenecks are CPU and Ram, all cpu cores are currently maxed out and the devs tell me that the Ram speeds are too slow.
We currently have 72 cores maxed out, Ideally if the budget permits looking to double that.

In terms of software, it's all in house developed software haven't got a clue what those guys are actually doing.
 
Don
Joined
19 May 2012
Posts
17,167
Location
Spalding, Lincolnshire
If the software can be sharded/clustered (e.g. split between multiple machines) then moving to something like a blade centre should give a decent increase in number of processor cores.

Depending on what the work actually is, may be worth looking at implementing via opencl on the GPUs, or even an FPGA hardware solution.

60k-70k for a single server is a huge budget, but where do you go after that? You need to be able to scale the app
 
Soldato
Joined
1 Apr 2014
Posts
18,623
Location
Aberdeen
Do you have GPUs fitted? If so, which ones?

With that budget you should be able to approach vendors for dedicated specialised solutions. Is the software multi-threaded, or are you running multiple instances? If the software is multi-threaded I think you're looking at a dual-Epyc solution.

However, do you need a server at all? Could you off-site it to Amazon or Microsoft?
 
Soldato
Joined
14 Apr 2014
Posts
2,586
Location
East Sussex
Hi guys, So the Bottlenecks are CPU and Ram, all cpu cores are currently maxed out and the devs tell me that the Ram speeds are too slow.
We currently have 72 cores maxed out, Ideally if the budget permits looking to double that.

In terms of software, it's all in house developed software haven't got a clue what those guys are actually doing.
Ok cool - if you have an opening with the Devs, ask them about the software and what kind of compiler optimisations they make to it, E.G are they specifically building it for Intel CPUs / Intel specific instructions, does it care a lot about memory memory bandwidth? Can it take advantage of accelerator cards with CUDA / OpenCL?


If just looking for something more modern and efficient and assuming its a pretty standard app and none of the above applies it might be worth taking a look at the HPE DL385 Gen10 or similar Epyc servers, AMD option will let you end up with more compute power and memory capacity for less than the cost of the equivalent performance from an Intel server, in 2u of rackspace - you will get more cores / threads and memory capacity for your money.
https://h20195.www2.hpe.com/v2/getdocument.aspx?docname=a00026913enw

If going Intel you probably want to be looking at at the 28 core 56 thread Xeon scalable CPUs - in a 2 CPU server you cannot achieve the memory or core counts of the EPYC servers, but you will more than beat the perf of the current server you use. You made need to look at Xeon Platinum if you want to get an 8 CPU system - though not many vendors actually ship servers with Xeon scalable platinum CPUs, a Xeon gold system will top out at 4 CPUs - so if looking to really add a lot of power and assuming you can't use GPUs / compute cards then a box with 4 X 28core / 56 thread Xeon and a terabyte of memory would probably be where it's at for you.

If memory bandwidth is extremely critical to performance, then Epyc might be worth testing as it's 8 channel memory per CPU with a faster memory than Xeon scalable gen 1, Xeon scalable gen 2 will be more competitive than gen 1 due to increased memory speed / bandwidth

As your budget is reasonable it might be worth finding some kit to evaluate your software on (you can rent physical servers of various types to test on - sometimes resellers will provide evaluation kit if you ask)

4 way Intel box is probably the safest choice without knowing a lot more about the setup (edit: if memory bandwidth is critical ensure you have enough ram to max out all memory channels for every CPU in the system)
 
Soldato
Joined
30 Sep 2005
Posts
16,549
I'd rather distribute the compute tasks across multiple nodes if possible. That way you can scale up in the future.
Blades are expensive really though, we're just in the process of moving away from our blades back to rackmount servers.

Definately worth asking the question....also, if the server goes offline you are a bit stuck whereas if a single nodes goes down the calcs carry on albeit at a slower rate.
 
Associate
OP
Joined
2 Mar 2007
Posts
599
Hi Guys just wanted to get some final thoughts before I pull the trigger, this is what I am planning to go with for the server.

This solution consists of 1 fully built and tested system.

1 x Intel S9256WK1HLC Compute Node with liquid cooling
2 x Intel Xeon Scalable 9282, 56 cores, 2.6 GHz (400W TDP)
16 x 128GB DDR4-2933 ECC Registered LRDIMM
2x Intel P4511 3.84TB M.2 NVMe
1 x Dual-port 10GbE SFP+ Network Adapter
1 x Remote Management Module
 
Associate
Joined
28 Feb 2008
Posts
472
Location
Northamptonshire
Hi Guys just wanted to get some final thoughts before I pull the trigger, this is what I am planning to go with for the server.

This solution consists of 1 fully built and tested system.

1 x Intel S9256WK1HLC Compute Node with liquid cooling
2 x Intel Xeon Scalable 9282, 56 cores, 2.6 GHz (400W TDP)
16 x 128GB DDR4-2933 ECC Registered LRDIMM
2x Intel P4511 3.84TB M.2 NVMe
1 x Dual-port 10GbE SFP+ Network Adapter
1 x Remote Management Module

Looks like a single node from a 4 node chassis which isn't released until q3 this year according to the Intel site. CPU was only released this quarter. Personally I wouldn't want to be that bleeding edge, but depends on your timescales & requirements. I would see if you can get it on "sale or return" or a PoC unit first, to ensure that it actually works as you require.
 
Don
Joined
19 May 2012
Posts
17,167
Location
Spalding, Lincolnshire
Hi Guys just wanted to get some final thoughts before I pull the trigger, this is what I am planning to go with for the server.

This solution consists of 1 fully built and tested system.

1 x Intel S9256WK1HLC Compute Node with liquid cooling
2 x Intel Xeon Scalable 9282, 56 cores, 2.6 GHz (400W TDP)
16 x 128GB DDR4-2933 ECC Registered LRDIMM
2x Intel P4511 3.84TB M.2 NVMe
1 x Dual-port 10GbE SFP+ Network Adapter
1 x Remote Management Module


Personally think it's a bad idea - it's a untenable position to keep buying cutting edge unicorn products as and when your app bottlenecks.

Standard servers like the Dl385 above (e.g. with 2x EPYC 7601) would give you 64 cores/128threads at a fraction of the cost and can be bought now.
Not quick enough? Then the app needs fixing so that you can throw more servers at it.

Amd's Rome based Epyc should also be available any time now, which will offer 64 cores / 128 threads per socket (and should work fine in a Dl385), so worth waiting to see how this affects things
 
Soldato
Joined
16 May 2007
Posts
3,220
As above I would be questioning the design and scalability of the app. If it has to be on a single server that is a significant business risk, if that server died what would you do and also it restricts the ability to upgrade/ grow / maintain hardware as required as well as the App its self.
 
Associate
OP
Joined
2 Mar 2007
Posts
599
Thanks a lot for the constructive feedback guys, I have forwarded the concerns / feedback to the PM. I will see what he says.
But he seems adamant to try and get the latest and greatest without much concern to cost.
 
Associate
Joined
28 Feb 2008
Posts
472
Location
Northamptonshire
Thanks a lot for the constructive feedback guys, I have forwarded the concerns / feedback to the PM. I will see what he says.
But he seems adamant to try and get the latest and greatest without much concern to cost.

It's not the budget that concerns me, it's how new the hardware is. 1st revision hardware on CPU & Motherboard, 1st revision firmware & drivers. Lots of potential issues yet to be discovered, especially on something that appears to be this business critical to you. It all depends on whose head ends up on the block if it all goes wrong.
 
Associate
OP
Joined
2 Mar 2007
Posts
599
It's not the budget that concerns me, it's how new the hardware is. 1st revision hardware on CPU & Motherboard, 1st revision firmware & drivers. Lots of potential issues yet to be discovered, especially on something that appears to be this business critical to you. It all depends on whose head ends up on the block if it all goes wrong.

Definitely not mine, I have raised all the concerns mentioned to the PM and I have them all in slack, if anything goes wrong and he tries to blame me, guess who's getting a print out of the convo?
I really do appreciate everyone's help.
 
Back
Top Bottom