Help with diagnosing crash problems

Associate
Joined
30 May 2018
Posts
3
Hello

I have an intermittent issue with 3 computers all HP z260, and I have been trying to solve it for months.
I am running the machines as headless render nodes for 3D software (Cinema 4D) - more or less at 100% load.


Computer Specs

All computers HP Z260, Windows 10 Pro 64 bit, 32 GB Ram, 1TB HDD, entry level Nvidia GPU,

CPUs have changed

Original setup
Computer 1: dual Xeon E5 2660 V1
Computer 2: dual Xeon E5 2680 V1
Computer 3: dual Xeon E5 2690 V2

New setup -
All CPU's changed to dual E5 2690 V2


Problem


There are two main symptoms I would like to diagnose,

1) two of the computers have shown signs of intermittently getting tired after a couple of hours of working, by this I mean that the time of completing roughly identical tasks (rendering very similar frames) exponentially gets slower. In the most extreme example the render time went from 4 mins to 24 mins.

The fix for this is simply to reboot the software. However according to the software manufacturer and it's associated forum, this is very unusual behaviour and should not happen.

2) all 3 of the computers suffer from intermittent software crashes, in which the entire system seizes to a halt and then the software stops responding.

I am using dual Xeon machines, which were bought refurbished from a reputable UK sales outlet. The hardware team tell me that these processors should be able to run constantly without the need to reboot.

The Cinema 4D forum suggests that there has been some issues with CPU's reaching max temp and then causing the application to crash. The suggested fix for this is to lower the clock speed to lower the temperature.

However I have been closely monitoring temperatures using intel power gadget and note that none of the temperatures on the machines go anywhere near the max temp. Usually around 65degrees per unit.


Questions

so, the hardware people say the software is the problem and the software people say the hardware is the problem.

It should be noted that I also have an iMac with exactly the same build of software installed and it has never crashed even when running for several days. It uses an i7 6700

1. I have a hunch that there could be an issue with the HDD's in the machines during the read/write process?
2. Could be the RAM? although this never seems to exceed 60% according to task manager
3. Do I need to reboot the machines regularly or should they be able to run for longer? The iMac is usually on constantly.

Any help would be amazing, thank you
 
However I have been closely monitoring temperatures using intel power gadget and note that none of the temperatures on the machines go anywhere near the max temp. Usually around 65degrees per unit.
If that's some generic CPU/package temperature actual cores can be running hot.

And with those crashes and age of original CPU there's also question of condition of PSUs.
Because looks like cases might use PSU as exhaust, in which case those have had to suck in lots of heat.
 
Also - assuming parts are functioning properly. Should these machines be able to run at 100% load constantly or am I expecting too much. I bought them on the basis they are server chips and supposedly can take heavy performance, but as I say are being outstripped by a 4 ghz quad i7 for stability.
 
Overheating processors or wonky psu would be my first port of call.
Clean out all fan/heaksinks with compressed air, reapply thermal paste to all cpus.
Run cpuz and monitor temps from there.

After that, make sure all are running fresh windows installs.

And yes they should run 100% all the time without issue.
 
One of the machines is under warranty (only 2 weeks in my studio)

So I would be in my rights for a new PSU

How would I check any of this?
Testing another PSU would be only sure way to exclude it as cause.
While slowdown aren't exactly something like weakening PSU would do crashes can certainly be caused by it.

Those "V1" CPUs are of Sandy Bridge architecture so if PSUs are original for the machines they've likely seen quite a lots of use.
After all such PCs aren't excatly bought for some word processing/playing Minesweeper.
Of course also other parts might have gotten stressed.

Slowdowns would be more like something maybe triggered by something heating up and causing some kind throttling.
Using some hardware monitoring program logging temperatures might show if there's any relation between temperatures and slowdowns.
Though bug in software could also cause such.
Have you checked for any unusual memory usage?
 
Back
Top Bottom