AMD vs Intel Single threading?

MartinPrince · 12 Dec 2019 at 17:14

humbug said:
The 9700K? what Screenshot? the only one you posted was for the 3900X.

This one here.
https://forums.overclockers.co.uk/posts/33216659/

humbug · 12 Dec 2019 at 17:22

CAT-THE-FIFTH said:
@humbug the increase in performance of your over 4.0GHZ Ryzen 5 3600 over my 3.6GHZ Ryzen 5 2600 is not as big as I expected - so I am at 27 seconds,and you are at 23~24 seconds,so its almost like its the clockspeed difference accounting for the performance jump in your case.

Edit!!

I will badger my mate with an overclocked Ryzen 7 2700 to see if he can run the test too,although his RAM is only running at 3000MHZ IIRC.

You're right it is... 23 vs 27 is a fraction under 18% faster, 3.65Ghz vs 4.075Ghz, to get from 3.65Ghz to 4.075Ghz you need just shy of 12% higher clocks, so that leaves just 6%.

CAT-THE-FIFTH · 12 Dec 2019 at 17:32

humbug said:
You're right it is... 23 vs 27 is a fraction under 18% faster, 3.65Ghz vs 4.075Ghz, to get from 3.65Ghz to 4.075Ghz you need just shy of 12% higher clocks, so that that leaves just 6%.

Its closer to 3.6GHZ with SMT on,so something seems a bit off IMHO - it seems to be using SMT fine on my Ryzen 5 2600,and like I said there is a slight performance boost keeping it enabled.

Edit!!

Humbug,try running the software with SMT switched off and see what results you get.

humbug · 12 Dec 2019 at 17:56

CAT-THE-FIFTH said:
Its closer to 3.6GHZ with SMT on,so something seems a bit off IMHO - it seems to be using SMT fine on my Ryzen 5 2600,and like I said there is a slight performance boost keeping it enabled.

Edit!!

Humbug,try running the software with SMT switched off and see what results you get.

ok

humbug · 12 Dec 2019 at 18:06

24 Seconds ¯\_(ツ)_/¯

sandys · 12 Dec 2019 at 19:12

MartinPrince said:
Thanks for doing that!

If anybody else wants to then this is how to apply the preset:

I do love to run a benchmark, only 22s on my gen 1 Ryzen though

finally justification for TR3

MartinPrince · 12 Dec 2019 at 19:13

CAT-THE-FIFTH said:
How are you get another 6% more with a 100MHZ clockspeed increase? Something is not right there!

But,this really puts things in perspective for me - I have a SFF PC, so it appears at stock any of these Intel CPUs won't actually be really quicker even if you are applying filters.

Actually it should be more like 10.5% to 15.7% not 12% - 18% so this would make it a ~5% improvement. At 5.3Ghz it does it most times in 16secs but because the in-built timer doesn't give decimals this could be 16.3secs rounded down whereas at 5.2Ghz it could be 16.8sec rounded up to 17secs. So then there's even less % improvement between the two.

MartinPrince · 12 Dec 2019 at 19:40

Rroff said:
Got any decent benchmarks for that? I've never found proper testing online. (You can find results for individual CPUs but not good comparisons side by side in equal circumstances over a broad range of application).

I've not tested it in awhile but when I did I found that AMD's SMT typically had a bigger penalty for enabling it which makes it look like the gains are bigger but really aren't. AMD's implementation tended to do better in synthetic tests and situations that were highly multi-threaded (probably due to the way they utilise the integer units) but worse in general applications and situations with more mixed workload demands - overall results were largely about the same.

It's just my own narrow anecdotal experience but this was done with a real-time, very high quality video capture. When I had HT enabled on a Xeon system then I would get dropped frame. Once I disabled HT then zero dropped frames.

With the Ryzen 3900X even leaving SMT enabled I did not get any dropped frames, though interestingly when if was combined with slower untuned RAM that is when I got dropped frames.

Rroff · 12 Dec 2019 at 19:48

MartinPrince said:
It's just my own narrow anecdotal experience but this was done with a real-time, very high quality video capture. When I had HT enabled on a Xeon system then I would get dropped frame. Once I disabled HT then zero dropped frames.

With the Ryzen 3900X even leaving SMT enabled I did not get any dropped frames, though interestingly when if was combined with slower untuned RAM that is when I got dropped frames.

Sounds like the software just doesn't like HT - back in the day with the Pentium 4 HT era a small number of games would stutter with HT enabled. Dunno what OS that was on but you might find it related to core parking as well - on some of my systems on Windows 7 I have to do the core parking tweak on CPUs with HT or it stutters in Battlefield games.

Deleted User 456458 · 12 Dec 2019 at 19:49

nm figured it out. first run 9900k 5.2ghz. 15 seconds:

Deleted User 456458 · 12 Dec 2019 at 19:54

Is that good?

CAT-THE-FIFTH · 12 Dec 2019 at 19:58

@humbug I made a few identical copies of the file and did a quick 5 image batch conversion using the same presets.

It takes 120 seconds for 5 images,which is 24 seconds per image. I tried it with 8 images,and it took around 184 seconds which is 23 seconds per image.

There is far better thread utilisation happening too.

My mate decided to run the same conversion on his system,which is an overclocked Ryzen 7 2700 running at 4GHZ with 3400MHZ DDR4 and NVME SSDs.

Their single image time is 25 seconds,with 5 images it took 91 seconds,around 18.2 seconds per image,they then tried 8 images,and took 132 seconds,or 16.5 seconds per image and with 64 images,it took 16.2 seconds.

With DxO its better to batch a few images together as it actually processes images much more efficiently. You can see that as it actually processes two images at any one time.

MartinPrince said:
Actually it should be more like 10.5% to 15.7% not 12% - 18% so this would make it a ~5% improvement. At 5.3Ghz it does it most times in 16secs but because the in built timer doesn't give decimals this could be 16.3secs rounded down whereas at 5.2Ghz it would be 16.8sec rounded up to 17secs. So then there's even less % improvement between the two.

Thanks for clarifying it!

TNA · 12 Dec 2019 at 20:15

Wow, this thread is on fire.

All I know is my CPU is better than Panos’s for anything that needs 6 cores or less (most things then)

MartinPrince · 12 Dec 2019 at 20:39

CAT-THE-FIFTH said:
@humbug I made a few identical copies of the file and did a quick 5 image batch conversion using the same presets.

It takes 120 seconds for 5 images,which is 24 seconds per image. I tried it with 8 images,and it took around 184 seconds which is 23 seconds per image.

There is far better thread utilisation happening too.

My mate decided to run the same conversion on his system,which is an overclocked Ryzen 7 2700 running at 4GHZ with 3400MHZ DDR4 and NVME SSDs.

Their single image time is 25 seconds,with 5 images it took 91 seconds,around 18.2 seconds per image,they then tried 8 images,and took 132 seconds,or 16.5 seconds per image and with 64 images,it took 16.2 seconds.

With DxO its better to batch a few images together as it actually processes images much more efficiently. You can see that as it actually processes two images at any one time.

Thanks for clarifying it!

Yes as you are discovering that you can also do batch exports which will fully load up all cores. There's a setting in preferences for Maximum Number of Simultaneously Processed Images. The default recommended amount is 2 but it can go as high as 8.

If you have a lot of images that all have the same adjustments then you can do it this way that will max out all cores. In this scenario multicore machines would come into their own. Historically though you'd find the odd one would fail in an error so I don't know if they've fixed that.

This way doesn't really work for me though because I generally have slight adjustments that I make to each photo so have to do them singularly.

MartinPrince · 12 Dec 2019 at 20:48

Rroff said:
Sounds like the software just doesn't like HT - back in the day with the Pentium 4 HT era a small number of games would stutter with HT enabled. Dunno what OS that was on but you might find it related to core parking as well - on some of my systems on Windows 7 I have to do the core parking tweak on CPUs with HT or it stutters in Battlefield games.

You could well be correct though I find it more so for occasions where you can't guarantee that all cores will be maxed out all the time and then the scheduler is employed in 'choosing' cores to utilise.

Robert896r1 said:
nm figured it out. first run 9900k 5.2ghz. 15 seconds:

I was going to send you a message asking you to run the test so I'm glad you did it. Yours is the fastest thus far I believe.

Deleted User 456458 · 12 Dec 2019 at 21:04

MartinPrince said:
I was going to send you a message asking you to run the test so I'm glad you did it. Yours is the fastest thus far I believe.

I did a few 53/49x run and it's still 15seconds.

This seems to have a mem bottleneck after a certain point.

humbug · 12 Dec 2019 at 21:11

CAT-THE-FIFTH said:
@humbug I made a few identical copies of the file and did a quick 5 image batch conversion using the same presets.

It takes 120 seconds for 5 images,which is 24 seconds per image. I tried it with 8 images,and it took around 184 seconds which is 23 seconds per image.

There is far better thread utilisation happening too.

My mate decided to run the same conversion on his system,which is an overclocked Ryzen 7 2700 running at 4GHZ with 3400MHZ DDR4 and NVME SSDs.

Their single image time is 25 seconds,with 5 images it took 91 seconds,around 18.2 seconds per image,they then tried 8 images,and took 132 seconds,or 16.5 seconds per image and with 64 images,it took 16.2 seconds.

With DxO its better to batch a few images together as it actually processes images much more efficiently. You can see that as it actually processes two images at any one time.

Thanks for clarifying it!

Yeah makes sense, exporting more than one image at a time should use more of the CPU's resources so CPU's with more of that will come into their own.

I'll stick with good old Photoshop CS6

which is also compatible for all the plugins and bridges i need for 3D texturing stuff.....

sandys · 12 Dec 2019 at 22:16

That software is definitely is not a fan of SMT

switching my machine to 8 core with SMT disabled and it was down to 19s, it must be 19.9 as it wavers between 19 and 20, with SMT on it was 21-22s, not bad for the pensioned off runt of the Ryzen Threadripper family

Tried overclocking to 4.2, this kept me in 19s I could not get into 18s, I have a next gen TR sat next to me though ready to drop in, I wonder what the increase in cache, cores or clocks will net me?

CAT-THE-FIFTH · 13 Dec 2019 at 01:30

MartinPrince said:
Yes as you are discovering that you can also do batch exports which will fully load up all cores. There's a setting in preferences for Maximum Number of Simultaneously Processed Images. The default recommended amount is 2 but it can go as high as 8.

If you have a lot of images that all have the same adjustments then you can do it this way that will max out all cores. In this scenario multicore machines would come into their own. Historically though you'd find the odd one would fail in an error so I don't know if they've fixed that.

This way doesn't really work for me though because I generally have slight adjustments that I make to each photo so have to do them singularly.

You can batch with individual adjustments per picture - that is what I have done when processing pictures I took for a friend.

MartinPrince · 13 Dec 2019 at 11:13

CAT-THE-FIFTH said:
You can batch with individual adjustments per picture - that is what I have done when processing pictures I took for a friend.

Actually yes you're right. I remember doing that on my Xeon 24 core system, though to get the cores to load up I had to select at least 6 simultaneously. The problem was that quite a few would error and then it became a chore to go and find which ones and then redo them, which in turn would offset the export into Lightroom which just meant more work for me in the end.

I'll try the batch export again for v3 which has recently come out to see if it's any better as then the 3900X will come into it's own and that gives me another option.