• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Intel Core Ultra 9 285k 'Arrow Lake' Discussion/News ("15th gen") on LGA-1851

While not the best release, I do like that Intel dropped its wattage and power. Its a much needed step in the right direction but they really still need to do much better overall with their power usage.

Going to be interesting when we get AMD halo early next year and I believe Intel is working on a similar type off platform/chipset.
 
this is really my point.. HT uses zero watts it uses the unused power from the main cores. but it is shown to have a very positive affect on gaming, 6c vs 6c/12t
Hyper Threading can slow down your threads as well, as execution ports in the core are contended between the two threads. It's heavily dependant on what type of work the threads are doing. If both threads are making heavy use of floating point operations, than the floating point ports in the core are shared between 2 threads, causing it be a bit slower than if they ran on separate cores.

Hyper Thread aims more for total thread throughput, rather than peak maximum performance per thread.

For example, say you have an 8 core CPU.
Without HT, you have 8 threads running full speed at say 100% performance.

With HT, you can have 16 threads running all at once, but they will all be running somewhere between 60 and 90% performance.

More work is done, at the expense of threads running slower than if they had the whole core to themselves.
 
Last edited:
Hyper Threading can slow down your threads as well, as execution ports in the core are contended between the two threads. It's heavily dependant on what type of work the threads are doing. If both threads are making heavy use of floating point operations, than the floating point ports in the core are shared between 2 threads, causing it be a bit slower than if they ran on separate cores.

Hyper Thread aims more for total thread throughput, rather than peak maximum performance per thread.

For example, say you have an 8 core CPU.
Without HT, you have 8 threads running full speed at say 100% performance.

With HT, you can have 16 threads running all at once, but they will all be running somewhere between 60 and 90% performance.

More work is done, at the expense of threads running slower than if they had the whole core to themselves.
HT's primary purpose was to accelerate thread switching. From what I remember, they basically duplicated some CPU registers and other things so the next thread could be loaded without stopping the currently working thread, then when the next one is loaded execution is switched over without the setup overhead and the process repeats. This helps keep the cores working instead of waiting for the next thread to load.
 
Finding it amusing the lengths some are going to trying to find the "missing performance"... even when you manage to improve the averages the baseline performance still, relatively, sucks. I doubt it is fixable with software, microcode or BIOS updates either.

HT never seemed all that efficient to me, particularly on older CPUs. Significantly increased power usage and temps for a relatively small performance improvement. More of a thing when core counts were more limited.

The impression I got from Intel, is that HT wasn’t worth the die space on Arrow Lake.

Personally found some decent gains from HT, sure not everything gets as good gains and it does slightly penalise single thread performance.
 
Finding it amusing the lengths some are going to trying to find the "missing performance"... even when you manage to improve the averages the baseline performance still, relatively, sucks. I doubt it is fixable with software, microcode or BIOS updates either.



Personally found some decent gains from HT, sure not everything gets as good gains and it does slightly penalise single thread performance.
The early i7's did a lot better over time than the i5's that did not have HT. I think all the security issues had a big part in dropping HT. At one point, Intel was recommending HT be disabled due to security problems.
 
HT's primary purpose was to accelerate thread switching. From what I remember, they basically duplicated some CPU registers and other things so the next thread could be loaded without stopping the currently working thread, then when the next one is loaded execution is switched over without the setup overhead and the process repeats. This helps keep the cores working instead of waiting for the next thread to load.
No that's not entirely true.
Yes there is duplicate registers and a whole duplicate thread context, but both threads execute at the same time. The instruction fetcher switches between each thread after each instruction.
The cpu core has soo much resources, that it's impossible to fully utilise all of it with a single thread, so hyperthreading utilises the resources to get 2 threads progressing at the same time.
 
No that's not entirely true.
Yes there is duplicate registers and a whole duplicate thread context, but both threads execute at the same time. The instruction fetcher switches between each thread after each instruction.
The cpu core has soo much resources, that it's impossible to fully utilise all of it with a single thread, so hyperthreading utilises the resources to get 2 threads progressing at the same time.
That explains the performance hit for games then as most of the load will be tightly packed floating point.
 
That explains the performance hit for games then as most of the load will be tightly packed floating point.

Games use a massive amount of square root calculations, but most games most of that is on 1-2 main threads, aside from well threaded physics, with a lot of the extra threads handling other stuff.

Even in this day an age along with some cache changes having a hybrid architecture with 2-4 pure single thread focussed cores along with a bunch of general purpose cores/threads would improve gaming performance immensely.
 
Games use a massive amount of square root calculations.
The dredged v = Normalize(v) and l = Length(v).
I am currently using:
static inline __m128 normalizeSSE4(const __m128& v)
{
__m128 tmp = _mm_set_ps1(_mm_dp_ps(v, v, 0x7f).m128_f32[0]);
return _mm_mul_ps(_mm_rsqrt_ps(tmp), v);
}
static inline float lengthSSE4(const __m128& a)
{
return std::sqrtf(_mm_dp_ps(a, a, 0x7f).m128_f32[0]);
}
seems to work ok.
 
The dredged v = Normalize(v) and l = Length(v).
I am currently using:
static inline __m128 normalizeSSE4(const __m128& v)
{
__m128 tmp = _mm_set_ps1(_mm_dp_ps(v, v, 0x7f).m128_f32[0]);
return _mm_mul_ps(_mm_rsqrt_ps(tmp), v);
}
static inline float lengthSSE4(const __m128& a)
{
return std::sqrtf(_mm_dp_ps(a, a, 0x7f).m128_f32[0]);
}
seems to work ok.

I've found these days modern compilers will often produce the best, or just as fast, result if presented with old school/long winded approaches to stuff like this :s and sometimes using clever tricks will actually make it harder for the compiler to understand what you are doing and produce potentially worse results unless you also optimise for the compiler LOL.

I might have to have a play later see what difference _mm256_dp_ps makes for calculating the dot product compared to doing it old school.
 
I've found these days modern compilers will often produce the best, or just as fast, result if presented with old school/long winded approaches to stuff like this :s and sometimes using clever tricks will actually make it harder for the compiler to understand what you are doing and produce potentially worse results unless you also optimise for the compiler LOL.

I might have to have a play later see what difference _mm256_dp_ps makes for calculating the dot product compared to doing it old school.
Its for my path tracer, tried letting the compiler do it and its slow (MS VS). using _mm256_dp_ps /_mm512_dp_ps requires batching and I would need to change a lot, I just cannot be bothered. I did the path tracer for fun but decided to add SSE4 just to test performance, got > 50% improvement, mostly from Box-Ray and Ray-Triangle intersection functions. SSE4 was easy to add, had to change some struct's to use alignment and add dummy fields to vec3 and that was it.
 

lol rekt

Bit misleading - they have actually sold single digit numbers of each Arrow Lake SKU and as above Mindfactory don't seem to have a customer base that buys Intel as much compared to average and also seems to lean towards AMD with their offerings (probably due to the market), also looks like they priced the Arrow Lake chips high and most of the time didn't have stock so no surprise they aren't selling.
 
The important stat is how that compares to other intel releases/normal sales.
But we don't have that info.
 
Last edited:
The important stat is how that compares to other intel releases/normal sales.
But we don't have that info.

Bit tricky to compare currently with the stock situation, etc. but looks poor compared to previous launches especially in terms of pre-orders, etc.
 
Back
Top Bottom