Intel Core Ultra 9 285k 'Arrow Lake' Discussion/News ("15th gen") on LGA-1851

R3X · 30 Oct 2024 at 12:24

While not the best release, I do like that Intel dropped its wattage and power. Its a much needed step in the right direction but they really still need to do much better overall with their power usage.

Going to be interesting when we get AMD halo early next year and I believe Intel is working on a similar type off platform/chipset.

Nutty667 · 30 Oct 2024 at 12:52

WARburton said:
this is really my point.. HT uses zero watts it uses the unused power from the main cores. but it is shown to have a very positive affect on gaming, 6c vs 6c/12t

Hyper Threading can slow down your threads as well, as execution ports in the core are contended between the two threads. It's heavily dependant on what type of work the threads are doing. If both threads are making heavy use of floating point operations, than the floating point ports in the core are shared between 2 threads, causing it be a bit slower than if they ran on separate cores.

Hyper Thread aims more for total thread throughput, rather than peak maximum performance per thread.

For example, say you have an 8 core CPU.
Without HT, you have 8 threads running full speed at say 100% performance.

With HT, you can have 16 threads running all at once, but they will all be running somewhere between 60 and 90% performance.

More work is done, at the expense of threads running slower than if they had the whole core to themselves.

FredFlint · 30 Oct 2024 at 13:22

Nutty667 said:
Hyper Threading can slow down your threads as well, as execution ports in the core are contended between the two threads. It's heavily dependant on what type of work the threads are doing. If both threads are making heavy use of floating point operations, than the floating point ports in the core are shared between 2 threads, causing it be a bit slower than if they ran on separate cores.

Hyper Thread aims more for total thread throughput, rather than peak maximum performance per thread.

For example, say you have an 8 core CPU.
Without HT, you have 8 threads running full speed at say 100% performance.

With HT, you can have 16 threads running all at once, but they will all be running somewhere between 60 and 90% performance.

More work is done, at the expense of threads running slower than if they had the whole core to themselves.

HT's primary purpose was to accelerate thread switching. From what I remember, they basically duplicated some CPU registers and other things so the next thread could be loaded without stopping the currently working thread, then when the next one is loaded execution is switched over without the setup overhead and the process repeats. This helps keep the cores working instead of waiting for the next thread to load.

Rroff · 30 Oct 2024 at 14:16

Finding it amusing the lengths some are going to trying to find the "missing performance"... even when you manage to improve the averages the baseline performance still, relatively, sucks. I doubt it is fixable with software, microcode or BIOS updates either.

g67575 said:
HT never seemed all that efficient to me, particularly on older CPUs. Significantly increased power usage and temps for a relatively small performance improvement. More of a thing when core counts were more limited.

The impression I got from Intel, is that HT wasn’t worth the die space on Arrow Lake.

Personally found some decent gains from HT, sure not everything gets as good gains and it does slightly penalise single thread performance.

FredFlint · 30 Oct 2024 at 14:31

Rroff said:
Finding it amusing the lengths some are going to trying to find the "missing performance"... even when you manage to improve the averages the baseline performance still, relatively, sucks. I doubt it is fixable with software, microcode or BIOS updates either.

Personally found some decent gains from HT, sure not everything gets as good gains and it does slightly penalise single thread performance.

The early i7's did a lot better over time than the i5's that did not have HT. I think all the security issues had a big part in dropping HT. At one point, Intel was recommending HT be disabled due to security problems.

Nutty667 · 30 Oct 2024 at 14:32

FredFlint said:
HT's primary purpose was to accelerate thread switching. From what I remember, they basically duplicated some CPU registers and other things so the next thread could be loaded without stopping the currently working thread, then when the next one is loaded execution is switched over without the setup overhead and the process repeats. This helps keep the cores working instead of waiting for the next thread to load.

No that's not entirely true.
Yes there is duplicate registers and a whole duplicate thread context, but both threads execute at the same time. The instruction fetcher switches between each thread after each instruction.
The cpu core has soo much resources, that it's impossible to fully utilise all of it with a single thread, so hyperthreading utilises the resources to get 2 threads progressing at the same time.

FredFlint · 30 Oct 2024 at 14:49

Nutty667 said:
No that's not entirely true.
Yes there is duplicate registers and a whole duplicate thread context, but both threads execute at the same time. The instruction fetcher switches between each thread after each instruction.
The cpu core has soo much resources, that it's impossible to fully utilise all of it with a single thread, so hyperthreading utilises the resources to get 2 threads progressing at the same time.

That explains the performance hit for games then as most of the load will be tightly packed floating point.

Rroff · 30 Oct 2024 at 14:55

FredFlint said:
That explains the performance hit for games then as most of the load will be tightly packed floating point.

Games use a massive amount of square root calculations, but most games most of that is on 1-2 main threads, aside from well threaded physics, with a lot of the extra threads handling other stuff.

Even in this day an age along with some cache changes having a hybrid architecture with 2-4 pure single thread focussed cores along with a bunch of general purpose cores/threads would improve gaming performance immensely.

FredFlint · 30 Oct 2024 at 15:43

Rroff said:
Games use a massive amount of square root calculations.

The dredged v = Normalize(v) and l = Length(v).
I am currently using:
static inline __m128 normalizeSSE4(const __m128& v)
{
__m128 tmp = _mm_set_ps1(_mm_dp_ps(v, v, 0x7f).m128_f32[0]);
return _mm_mul_ps(_mm_rsqrt_ps(tmp), v);
}
static inline float lengthSSE4(const __m128& a)
{
return std::sqrtf(_mm_dp_ps(a, a, 0x7f).m128_f32[0]);
}
seems to work ok.

Rroff · 30 Oct 2024 at 16:17

FredFlint said:
The dredged v = Normalize(v) and l = Length(v).
I am currently using:
static inline __m128 normalizeSSE4(const __m128& v)
{
__m128 tmp = _mm_set_ps1(_mm_dp_ps(v, v, 0x7f).m128_f32[0]);
return _mm_mul_ps(_mm_rsqrt_ps(tmp), v);
}
static inline float lengthSSE4(const __m128& a)
{
return std::sqrtf(_mm_dp_ps(a, a, 0x7f).m128_f32[0]);
}
seems to work ok.

I've found these days modern compilers will often produce the best, or just as fast, result if presented with old school/long winded approaches to stuff like this :s and sometimes using clever tricks will actually make it harder for the compiler to understand what you are doing and produce potentially worse results unless you also optimise for the compiler LOL.

I might have to have a play later see what difference _mm256_dp_ps makes for calculating the dot product compared to doing it old school.

FredFlint · 30 Oct 2024 at 16:44

Rroff said:
I've found these days modern compilers will often produce the best, or just as fast, result if presented with old school/long winded approaches to stuff like this :s and sometimes using clever tricks will actually make it harder for the compiler to understand what you are doing and produce potentially worse results unless you also optimise for the compiler LOL.

I might have to have a play later see what difference _mm256_dp_ps makes for calculating the dot product compared to doing it old school.

Its for my path tracer, tried letting the compiler do it and its slow (MS VS). using _mm256_dp_ps /_mm512_dp_ps requires batching and I would need to change a lot, I just cannot be bothered. I did the path tracer for fun but decided to add SSE4 just to test performance, got > 50% improvement, mostly from Box-Ray and Ray-Triangle intersection functions. SSE4 was easy to add, had to change some struct's to use alignment and add dummy fields to vec3 and that was it.

Glanza · 30 Oct 2024 at 17:30

Robert896r1 · 30 Oct 2024 at 17:47

Intel hasn't sold a single Arrow Lake CPU at Germany's largest retailer — Core Ultra 200S sales stagnate after just one week

Arrow Lake's disappointing sales match its underwhelming performance.

www.tomshardware.com

lol rekt

Grim5 · 30 Oct 2024 at 17:53

Robert896r1 said:
Intel hasn't sold a single Arrow Lake CPU at Germany's largest retailer — Core Ultra 200S sales stagnate after just one week

Arrow Lake's disappointing sales match its underwhelming performance.

www.tomshardware.com

lol rekt

No one buys Intel in Germany anyway it's an amd country

RSR · 30 Oct 2024 at 18:03

Robert896r1 said:
Intel hasn't sold a single Arrow Lake CPU at Germany's largest retailer — Core Ultra 200S sales stagnate after just one week

Arrow Lake's disappointing sales match its underwhelming performance.

www.tomshardware.com

lol rekt

There has been no stock, the bulk of it seems to be arriving next week.

Sylar2k20 · 30 Oct 2024 at 18:20

RSR said:
There has been no stock, the bulk of it seems to be arriving next week.

But can preoder

but nobody buy arrow lake

)

Rroff · 30 Oct 2024 at 18:22

Robert896r1 said:
Intel hasn't sold a single Arrow Lake CPU at Germany's largest retailer — Core Ultra 200S sales stagnate after just one week

Arrow Lake's disappointing sales match its underwhelming performance.

www.tomshardware.com

lol rekt

Bit misleading - they have actually sold single digit numbers of each Arrow Lake SKU and as above Mindfactory don't seem to have a customer base that buys Intel as much compared to average and also seems to lean towards AMD with their offerings (probably due to the market), also looks like they priced the Arrow Lake chips high and most of the time didn't have stock so no surprise they aren't selling.

Rob_B · 30 Oct 2024 at 18:36

The important stat is how that compares to other intel releases/normal sales.
But we don't have that info.

Rroff · 30 Oct 2024 at 18:38

Rob_B said:
The important stat is how that compares to other intel releases/normal sales.
But we don't have that info.

Bit tricky to compare currently with the stock situation, etc. but looks poor compared to previous launches especially in terms of pre-orders, etc.

Rob_B · 30 Oct 2024 at 18:43

oh absolutely, OCUK numbers would be interesting to see