Not sure what to make of that graph, I would have suspicions of any game where the i3 comfortably beats the FX in a "multi-threaded" game. Strange, and where are the i5's?
The Ashes of the Singularity developer attempted to address the difference between AMD and Intel performance in the benchmark, I'll copy and paste a detailed post from overclock.net below: (credit to the poster "Mahigan" for this analysis of the developer notes)
Source:
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/850#post_24335709
---------------------------------------------------------------------------------------
Alright folks,
I have news for you from Tim Kipp over at Oxide on the CPU Optimizations found in Ashes of the Singularity. Tim also went into detail about the Memory bandwidth issues which can arise (which may explain the AMD CPU issues in the benchmark) as well as other tidbits of information we can all use in order to better understand just what is happening, behind the scenes, which leads to the results we are seeing.
Ashes Developer said:
Hi xxxxxxxxx,
Thanks for your interest in the Ashes of the Singularity benchmark.
In order to get an accurate picture of how well a given CPU will perform, it's important to look at the CPU Frame rate with Infinite GPU on and off ( a check box exists on the benchmark settings panel ). Note, while on, you may see some graphical corruption due to use of async shaders, however the results will be valid.
With Infinite GPU, you should see %90+ workload on your CPU. In this mode, we do not "wait" in the case where the GPU is still busy. You should see excellent scaling between 4-16 thread machines.This can only be tracked on DX12.
Without Infinite GPU, the CPU will "Wait" on a signal from the GPU that the ready to process another frame. During this wait, the CPU tends to power down when there isn't any additional work to do and effectively serializes a portion of the frame. This serialization is what causes the CPU frame rate discrepancy between Infinite GPU on and off.
In addition, due to this "wait", one interesting stat to track is your power draw. On DX11 the power draw tends to be much higher than on DX12, as the additional serial threads that the driver needs to process the GPU commands effectively forces the CPU to be active even if it is only using a fraction of it's cores. This tends to be an overlooked benefit to DX12 since the API is designed so that engines can evenly distribute work.
Regarding specific CPU workloads and the differences between AMD and Intel it will be important to note a few things.
1. We have heavily invested in SSE ( mostly 2 for compatibility reasons ) and a significant portion of the engine is executing that code during the benchmark. It could very well be 40% of the frame. Possibly more.
2. While we do have large contiguous blocks of SSE code ( mainly in our simulations ) it is also rather heavily woven into the entire game via our math libraries. Our AI and gameplay code tend to be very math heavy.
3. The Nitrous engine is designed to be data oriented ( basically we know what memory we need and when ). Because of this, we can effectively utilize the SSE streaming memory instructions in conjunction with prefetch ( both temporal and non temporal ). In addition, because our memory accesses are more predictable the hardware prefetcher tends to be better utilized.
4. Memory bandwidth is definitely something to consider. The larger the scope of the application, paired with going highly parallel puts a lot of pressure on the Memory System. On my i7 3770s i'm hitting close to peak bandwith on 40% of the frame.
I hope this information helps point you in the right direction for your investigation into the performance differences between AMD and Intel. We haven't done exhaustive comparative tests, but generally speaking we have found AMD chips to compare more favorably to Intel than what is displayed via synthetic benchmarks. I'm looking forward to your results.
# # #
Notes (added as time permits):
- The good news is there are no AVX optimisations. Oxide have used SSE2 instead, for compatibility reasons as mentioned. This should give Intel processors only a slight edge, nothing incredible.
- The better utilization of the hardware prefetcher would point to far better performance on Vishera than on Bulldozer. One of Vishera's selling points, over Bulldozer, were the improvements to its hardware prefetcher. Steamroller did not improve further in terms of prefetching therefore the better performance of the A10-7870K cannot be attributed to this factor. We will have to look elsewhere.
- The integer and floating point register files were increased in size in Steamroller while Load operations (two operands) were compressed in order to fit a single entry in the physical register file, which helps increase the effective size of each RF over both Bulldozer and Vishera. This would give Steamroller an edge in terms of integer execution which could account for some of the performance variance between Steamroller and Vishera\Bulldozer. The scheduling windows were made bigger in Steamroller which allow for greater utilization of execution resources (better for Draw Call execution for example). These improvements, together, could account for the performance increase we see with Steamroller. Steamroller also benefits from around 30% overall Ops per cycle over Vishera as a result of its improved FPU. the following slide, provided by AMD, gives us a glimpse as to some of the improvements that arrived with Steamroller:
CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90
- Memory bandwidth is also an important part of the equation. The Core i7-3770K is an Ivy-Bridge part which is most likely paired with a socket 1155 motherboard utilizing the Z77 chipset. The usual memory configuration for an Ivy Bridge part is a Dual Channel 1600MHz DDR3 configuration. This usually allows for around 20GB/s of Read and Write bandwidth. If 40% of the frame is leading to peak bandwidth usage then the same should be considered for an AMD FX-8350 part paired with an AMD 990FX Chipset whose peak bandwidth is around 19GB/s. It is no secret that the AMD FX-8350 benefits, moreso, than Intel parts by running faster memory (usually 1866MHz is recommended) due to the architecture being memory bandwidth starved. Therefore we can conclude that memory bandwidth could be, at least partially, to blame for the performance difference between AMDs FX series and Intel's Core ix series in AotS