• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Anyone know how to estimate bandwidth bottlenecks?

Soldato
Joined
22 Dec 2008
Posts
10,369
Location
England
I'm thinking of the simple single socket, ddr3 based desktop computer. Bit of a long shot, but I know there are some electrical guys on here to whom the answer will be glaringly obvious. Please take pity on a mechanic!

An ssd is capable of something like 300 mbyte/s sustained transfer. I don't know if thats 300 mbyte/s reading & 300 mbyte/s writing simultaneously but suspect not. Slow either way.

Memory bandwidth is fairly simple. It's written on the sticks in mbyte/s. PC3-12800 (1600mhz ddr3) is at best effort capable of 12800 mbyte/s. Two channels can double that, four quadruple it. Again, I don't know if that's up+down or full speed in either direction.

12800 >> 300. Ram >> ssd. Not very surprising. Where does the cpu - ram link fit in though.

In particular, what can a single core of a given processor sustain? I'm particularly interested in how many cores one needs to saturate dual channel or quad channel systems. I think there will be a cpu-memory bandwidth figure somewhere, anyone know where? And how would such a figure relate to a single threaded program?

The answer would presumably depend on cpu and memory frequencies. That's fine, I'm quite happy running the numbers if someone can point me at the baseline case.

Thank you :)

edit: WIki tells me 25.6gbyte/s for a QPI link, which I think is what connects a modern intel cpu to the memory. Which can presumably be matched by dual channel ram, leaving me confused by the quad channel 2011 system.

edit2: So, QPI doesn't go between processor and ram. From what little I've been able to find online (wrong keywords?), the limiting factor is the ram. The processor is quite capable of reading/writing to the ram faster than the ram itself can handle. I'm not certain of this, but it appears to be the case. Further, the consensus on stackoverflow appears to be that a single core can saturate the ram bandwidth.

This suggests the conclusion that six cores on 2011 is a mistake if the problem fits in memory but not in cache. I'm still having trouble verifying this, in particular there's the interesting idea that a quad core and a dual core Haswell may perform equivalently.
 
Last edited:
Good research. Not something I know much about.

This suggests the conclusion that six cores on 2011 is a mistake if the problem fits in memory but not in cache. I'm still having trouble verifying this, in particular there's the interesting idea that a quad core and a dual core Haswell may perform equivalently.

You've forgotten that all these rates are maxima, which you can reach only using benchmarking software designed to access the memory at the highest rate possible and do little/no calculations otherwise.

In real problems, the memory can happily be accessed and only a small chunk copied to the CPU cache, where many clock cycles are spent performing operations, before copying data back. Perhaps this delay while the CPU performs computation means that the actual rate of read/write to the memory that the processor needs is sufficiently low that 4 or 6 (or possibly far more) processors can happily share the same QPI link.

You may still be right though that synthetic bandwidth benchmarks will reach a limit soon. Could this be why DDR4 is on the way?
 
Back
Top Bottom