Ethernet Chip Hot?

Soldato
Joined
31 Jan 2022
Posts
3,626
Location
UK
I have a strange problem with one PC.

While the internet seems to work just fine, copying large files over the LAN does not.

What happens is that the transfer starts and will eventually complete, but in the meantime, about every two to five seconds, the transfer rate will fall to zero.
No errors are reported, I can't find any problems with Windows.
I am thinking this may be the Ethernet chip throttling, but have never seen this issue before, so I am uncertain.
Also, I am not sure how the internet seem to be affected by this. It seems that it's not. Furthermore, accessing the internet seems to kickstart the lan transfer at times (but not always).
Can anyone shed any light on this?
 
Most likely a driver or Windows issues. Most modern motherboards have poor NICs on them and sometimes use 'killer' or 'gaming' software which should be disabled really.

Try a Linux live boot from USB and see how that behaves.
 
What make/model of motherboard then? Or what is the network controller listed as in device manager?

It's a Gigabyte z390 Aorus Ultra. I have a second one that is identical in every way, it's even running the same software and doing the same job, yet that one is working fine.
Win11. I have scanned for issues in Windows/corrupt files and so on, and it finds nothing.
It's very strange the way the port just stops without reporting an error, then restarts again some time later. It's responding like there is nothing wrong at all.

Anyway, I have a cheap NIC on the way to test. With luck, it may sidestep the problem - whatever it is.
 
Last edited:
Assuming its actually the z390 Aorus Ultra, then it's an Intel I-219V

I'd start with the latest drivers direct from Intel (regardless of whether Windows has given you the "latest" version, or even if you think it has the same version as the working motherboard).



Edit:
While you are at it, worth disabling all power management options e.g.
"Allow the computer to turn off this device to save power"

and anything under advanced relating to Ethernet power saving, Green Ethernet, Energy Efficient Ethernet etc.
 
Last edited:
It's not the network, it's the SSD!
Trying a performance benchmark on the SSD it failed "with errors".
Interestingly, the tests I ran on the internet also wrote to the same SSD. It seems that the controller chip must be to blame here.
I monitored the temperature and that's fine, so who knows what the issue is.
New SSD is on order!
Very interestingly, the same drive in the second PC failed just over a month ago. That one just stopped working. These are drives that get a heavy amount of writing, they are used every single day for, oh, upwards of 5 GB of transfers. Both drives were apparently fine in tests and so on, so I suppose it was not the actual memory that failed, rather some secondary chip.
 
Last edited:
Do you mean 5GB or 5TB of writes? Depending on the SSD you might just be wearing the flash out.
It's Samsung 830, 256GB, 16.6TB written. Checked out in Magician as working just fine, condition "good". The SSD passed all tests, nothing wrong with it, except for one - the performance test. The performance test failed because it just stopped while it was writing to the flash. No specific error given.

It's kinda worrying, really, that these things can fail with no error message in normal operation. If you don't watch the transfers, you have no idea it's failing.

It makes me wonder how slowly this failed. Has the performance been falling away over the past year? Who knows?

Anyway, today I have a 500GB drive arriving to replace it.
 
Last edited:
It's kinda worrying, really, that these things can fail with no error message in normal operation.
Not really it's a drive that was released in 2011.


Anyway, today I have a 500GB drive arriving to replace it.

It's worth buying a bigger drive, especially given today's prices as they have more overprovisioning/more NAND to wear level across.
 
Not really it's a drive that was released in 2011.


It's worth buying a bigger drive, especially given today's prices as they have more overprovisioning/more NAND to wear level across.

The replacement was £28 lol.

It doesn't surprise me that of all the drives I have, those two (one in each of the duplicate PC's) failed. They spend oh, four years maybe with very low number of writes per day, but then I replaced them and shifted them to less important roles but with a much higher number of writes per day. It does surprise me, though, that there was no warning of failure in either case. A drive just slowing down is not what I would have expected. Ah, live and learn.
 
Last edited:
The replacement was £28 lol.
But a 1TB drive normally gets you double the endurance (and even more if you only allocate a 500GB partition to it - since the rest of the drive can be used for garbage collection / wear levelling)

e.g. Crucial MX500 - 500GB vs 1TB = 180TBW vs 360TBW

It does surprise me, though, that there was no warning of failure in either case. A drive just slowing down is not what I would have expected.
It's an older drive though - newer drives will likely have more advanced monitoring, and also better wear levelling algorithms etc
 
All sorted. The Ethernet transfers are back up to normal speed!

Sorry for the misdirection in this post. I didn't know that an SSD could fail in this way, so I naturally blamed the Ethernet rather than the real cause.

But a 1TB drive normally gets you double the endurance (and even more if you only allocate a 500GB partition to it - since the rest of the drive can be used for garbage collection / wear levelling)

The drives don't get a lot of data stored on them. Mostly around 20GB. Sometimes it wanders up to 100GB but not for long. The main factor with these drives is the data changes a lot. It's entirely erased and re-written every few days. This is the quickest way to wear out the flash, but it seems also the quickest way to wear out everything else. The problem was consistent with the drive overheating, except it wasn't, so I assume there was a fault in the controller chip.
It's possible it was the flash itself that was failing, but I would have thought that diagnostics would have picked that up. I do remember, from my distant past, that flash does fail this way. It's remarkably predictable. If one block wears out, hundreds of thousands will, all at the same time. I remember once I had a bunch of flash chips, and they all failed when they reached a certain re-program count. It was remarkable. It was like they all formed a union and went on strike at the same time!
Anyway, the drive is in the bin. As is my collection of old SSD's that was going to be used for other tasks!! Given trouble they can cause and the low price of replacements, it's not worth keeping them.
 
They will fail at the same time because the flash controller is doing wear levelling. If you have a 256GB SSD and only write to 10GB of it but constantly erase and re-write that data, it doesn't wear out 10GB of flash, it spreads the data all over the available storage.
 
Be interesting to know what drive you bought for £28 - just because drives are cheap, doesn't mean they are good.

Newer QLC based drives have less endurance than older TLC, MLC and SLC, and if you've bought a QLC, potentially each NAND chip can have as little as 1000 write cycles.
(You could also seek out some used enterprise grade drives e.g. Intel DC S3700 that are rated at 10 Drive Writes per day)

For decent longevity with tasks that do a lot of repetitive writes, then ideally you want to start with a TLC or MLC based drive, and you need to do all of the old school SSD optimisations (making sure the partition is correctly aligned to avoid write amplification), manually overprovisioning by leaving some space unused when partitioning, and ensuring you have the latest firmware/ssd toolbox software.

Some drives/software also allow you to make use of RAM caching of the SSD (e.g. Crucial Momentum Cache), to help combine multiple small writes into larger optimised writes, to help reduce the wear to the NAND.
 
Last edited:
Back
Top Bottom