Ethernet Chip Hot?

SpellowHouse · 12 Jun 2023 at 08:17

I have a strange problem with one PC.

While the internet seems to work just fine, copying large files over the LAN does not.

What happens is that the transfer starts and will eventually complete, but in the meantime, about every two to five seconds, the transfer rate will fall to zero.
No errors are reported, I can't find any problems with Windows.
I am thinking this may be the Ethernet chip throttling, but have never seen this issue before, so I am uncertain.
Also, I am not sure how the internet seem to be affected by this. It seems that it's not. Furthermore, accessing the internet seems to kickstart the lan transfer at times (but not always).
Can anyone shed any light on this?

lmfy2k · 12 Jun 2023 at 08:21

Using onboard network card?

Tried a different Ethernet cable?

Tried reinstalling drivers?

SpellowHouse · 12 Jun 2023 at 08:24

lmfy2k said:
Using onboard network card?

Tried a different Ethernet cable?

Tried reinstalling drivers?

I will try both of those and get back in an hour.

str · 12 Jun 2023 at 08:34

Ethernet throughput keeps randomly dropping to 0 every few seconds. - Microsoft Q&A

@Gary Nebbett help me out please. Ethernet throughput keeps droping to 0 every few seconds but my internet works perfectly fine for every other computer without any drops. I even changed the cable and it still randomly drops to 0 all browsers are…

learn.microsoft.com

ChrisD. · 12 Jun 2023 at 08:36

Most likely a driver or Windows issues. Most modern motherboards have poor NICs on them and sometimes use 'killer' or 'gaming' software which should be disabled really.

Try a Linux live boot from USB and see how that behaves.

Armageus · 12 Jun 2023 at 08:59

What Make/Model of Ethernet chip?

(e.g. certain Intel 2.5Gb chips have had issues recently)

SpellowHouse · 12 Jun 2023 at 09:30

Nope. Neither.

Armageus said:
What Make/Model of Ethernet chip?

(e.g. certain Intel 2.5Gb chips have had issues recently)

It's intel, but I am not sure of the type. It's not 2.5G though.

Still nothing has shown any possible reason.

Armageus · 12 Jun 2023 at 09:38

SpellowHouse said:
It's intel, but I am not sure of the type. It's not 2.5G though.

What make/model of motherboard then? Or what is the network controller listed as in device manager?

SpellowHouse · 12 Jun 2023 at 09:41

Armageus said:
What make/model of motherboard then? Or what is the network controller listed as in device manager?

It's a Gigabyte z390 Aorus Ultra. I have a second one that is identical in every way, it's even running the same software and doing the same job, yet that one is working fine.
Win11. I have scanned for issues in Windows/corrupt files and so on, and it finds nothing.
It's very strange the way the port just stops without reporting an error, then restarts again some time later. It's responding like there is nothing wrong at all.

Anyway, I have a cheap NIC on the way to test. With luck, it may sidestep the problem - whatever it is.

Armageus · 12 Jun 2023 at 09:54

Assuming its actually the z390 Aorus Ultra, then it's an Intel I-219V

I'd start with the latest drivers direct from Intel (regardless of whether Windows has given you the "latest" version, or even if you think it has the same version as the working motherboard).

Intel® Ethernet Adapter Complete Driver Pack

This download contains all files for version 30.4 of the Intel® Ethernet Adapter Complete Driver Pack for all supported OS versions.

www.intel.com

Edit:
While you are at it, worth disabling all power management options e.g.
"Allow the computer to turn off this device to save power"

and anything under advanced relating to Ethernet power saving, Green Ethernet, Energy Efficient Ethernet etc.

SpellowHouse · 12 Jun 2023 at 10:29

It's not the network, it's the SSD!
Trying a performance benchmark on the SSD it failed "with errors".
Interestingly, the tests I ran on the internet also wrote to the same SSD. It seems that the controller chip must be to blame here.
I monitored the temperature and that's fine, so who knows what the issue is.
New SSD is on order!
Very interestingly, the same drive in the second PC failed just over a month ago. That one just stopped working. These are drives that get a heavy amount of writing, they are used every single day for, oh, upwards of 5 GB of transfers. Both drives were apparently fine in tests and so on, so I suppose it was not the actual memory that failed, rather some secondary chip.

bledd · 12 Jun 2023 at 15:35

What model is the SSD out of interest?

Caged · 12 Jun 2023 at 15:36

Do you mean 5GB or 5TB of writes? Depending on the SSD you might just be wearing the flash out.

SpellowHouse · 13 Jun 2023 at 07:06

Caged said:
Do you mean 5GB or 5TB of writes? Depending on the SSD you might just be wearing the flash out.

It's Samsung 830, 256GB, 16.6TB written. Checked out in Magician as working just fine, condition "good". The SSD passed all tests, nothing wrong with it, except for one - the performance test. The performance test failed because it just stopped while it was writing to the flash. No specific error given.

It's kinda worrying, really, that these things can fail with no error message in normal operation. If you don't watch the transfers, you have no idea it's failing.

It makes me wonder how slowly this failed. Has the performance been falling away over the past year? Who knows?

Anyway, today I have a 500GB drive arriving to replace it.

Armageus · 13 Jun 2023 at 07:35

SpellowHouse said:
It's kinda worrying, really, that these things can fail with no error message in normal operation.

Not really it's a drive that was released in 2011.

SpellowHouse said:
Anyway, today I have a 500GB drive arriving to replace it.

It's worth buying a bigger drive, especially given today's prices as they have more overprovisioning/more NAND to wear level across.

SpellowHouse · 13 Jun 2023 at 07:59

Armageus said:
Not really it's a drive that was released in 2011.

It's worth buying a bigger drive, especially given today's prices as they have more overprovisioning/more NAND to wear level across.

The replacement was £28 lol.

It doesn't surprise me that of all the drives I have, those two (one in each of the duplicate PC's) failed. They spend oh, four years maybe with very low number of writes per day, but then I replaced them and shifted them to less important roles but with a much higher number of writes per day. It does surprise me, though, that there was no warning of failure in either case. A drive just slowing down is not what I would have expected. Ah, live and learn.

Armageus · 13 Jun 2023 at 08:23

SpellowHouse said:
The replacement was £28 lol.

But a 1TB drive normally gets you double the endurance (and even more if you only allocate a 500GB partition to it - since the rest of the drive can be used for garbage collection / wear levelling)

e.g. Crucial MX500 - 500GB vs 1TB = 180TBW vs 360TBW

https://content.crucial.com/content/dam/crucial/ssd-products/mx500/flyer/crucial-mx500-ssd-productflyer-en.pdf

SpellowHouse said:
It does surprise me, though, that there was no warning of failure in either case. A drive just slowing down is not what I would have expected.

It's an older drive though - newer drives will likely have more advanced monitoring, and also better wear levelling algorithms etc

SpellowHouse · 14 Jun 2023 at 06:31

All sorted. The Ethernet transfers are back up to normal speed!

Sorry for the misdirection in this post. I didn't know that an SSD could fail in this way, so I naturally blamed the Ethernet rather than the real cause.

Armageus said:
But a 1TB drive normally gets you double the endurance (and even more if you only allocate a 500GB partition to it - since the rest of the drive can be used for garbage collection / wear levelling)

The drives don't get a lot of data stored on them. Mostly around 20GB. Sometimes it wanders up to 100GB but not for long. The main factor with these drives is the data changes a lot. It's entirely erased and re-written every few days. This is the quickest way to wear out the flash, but it seems also the quickest way to wear out everything else. The problem was consistent with the drive overheating, except it wasn't, so I assume there was a fault in the controller chip.
It's possible it was the flash itself that was failing, but I would have thought that diagnostics would have picked that up. I do remember, from my distant past, that flash does fail this way. It's remarkably predictable. If one block wears out, hundreds of thousands will, all at the same time. I remember once I had a bunch of flash chips, and they all failed when they reached a certain re-program count. It was remarkable. It was like they all formed a union and went on strike at the same time!
Anyway, the drive is in the bin. As is my collection of old SSD's that was going to be used for other tasks!! Given trouble they can cause and the low price of replacements, it's not worth keeping them.

Caged · 14 Jun 2023 at 09:12

They will fail at the same time because the flash controller is doing wear levelling. If you have a 256GB SSD and only write to 10GB of it but constantly erase and re-write that data, it doesn't wear out 10GB of flash, it spreads the data all over the available storage.

Armageus · 14 Jun 2023 at 09:34

Be interesting to know what drive you bought for £28 - just because drives are cheap, doesn't mean they are good.

Newer QLC based drives have less endurance than older TLC, MLC and SLC, and if you've bought a QLC, potentially each NAND chip can have as little as 1000 write cycles.
(You could also seek out some used enterprise grade drives e.g. Intel DC S3700 that are rated at 10 Drive Writes per day)

For decent longevity with tasks that do a lot of repetitive writes, then ideally you want to start with a TLC or MLC based drive, and you need to do all of the old school SSD optimisations (making sure the partition is correctly aligned to avoid write amplification), manually overprovisioning by leaving some space unused when partitioning, and ensuring you have the latest firmware/ssd toolbox software.

Some drives/software also allow you to make use of RAM caching of the SSD (e.g. Crucial Momentum Cache), to help combine multiple small writes into larger optimised writes, to help reduce the wear to the NAND.