• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Trinity CPU to be up to 20% faster than Llano

However,this was under Linux.

This is my main concern: how prevalent are its architectural shortcomings under the current version of Windows OS? Still, looking forward to seeing this in action. IGP looks solid, and Piledriver spec is set to negate the power draw and IPC issues of the current BD effort. Fingers crossed - once more for luck! :D
 
Last edited:
This is my main concern: how prevalent are its architectural shortcomings under the current version of Windows OS? Still, looking forward to seeing this in action. IGP looks solid, and Piledriver spec is set to negate the power draw and IPC issues of the current BD effort. Fingers crossed - once more for luck! :D

problem is the current generation of Bulldozer is supposed to have very similar instructions per clock to K10.5, it was meant to either match or improve in that aspect, that was one of the fundamental design parameters. interesting it is the only real aspect that Bulldozer has failed on, power consumption again will be fixed with revisions rather than new design (Piledriver) which is supposed to be ~10/15% instruction per clock increase on the original Bulldozer design (i.e. equal to or better than Phenom II instructions per clock wise!) ;)

Edit: and another problem is how Windows 7 doesn't use Bulldozer in an efficient manner, keeps scheduling it in a 1 module, 2 core scenario which in applications that don't need the CMT feature actually slow it down, sometimes dramatically, in a page I was reading this morning that analyses that exact thing, the conclusion was in some situations (applications that aren't single threaded but aren't heavily multi-threaded, some games for example) it is loosing a substantial amount of performance (15 - 20% in some cases) because of the fact windows keeps sending two threads to one module, when there are other modules available and idle, so the optimal setup is to send two threads to two modules so each module can make best use of the resources available like the full FPU for example. so with regards to Bulldozer there are improvements that will come, am pretty sure of that.
 
Last edited:
I find this quite difficult to believe. These guys were working on this architecture for quite some time. Why didn't they contacted Microsoft in order for them to release a Windows 7 patch before Bulldozer come out. In addition l haven't heard anything from AMD or Microsoft abour releasing a patch that addresses this issue.
 
well since Microsoft and Intel have been practically in bed with each other in recent times is it really a surprise? not like I would ever suggest foul play, because that is so un-Intel like...:rolleyes:

That explains why it's so much faster in Linux............... Oh wait.
Admittedly, I know BD needs a scheduling patch in W7, BUT WHERE THE HELL IS IT?
 
That explains why it's so much faster in Linux............... Oh wait.
Admittedly, I know BD needs a scheduling patch in W7, BUT WHERE THE HELL IS IT?

got lost in the post I think :D to be fair its something AMD should have got sorted before release! but a patch would help, so would a new stepping from AMD to get rid of the shocking power consumption, but an ideal world it is not! ;)
 
I find this quite difficult to believe. These guys were working on this architecture for quite some time. Why didn't they contacted Microsoft in order for them to release a Windows 7 patch before Bulldozer come out. In addition l haven't heard anything from AMD or Microsoft abour releasing a patch that addresses this issue.

God knows why not, but if they had it would have brought a nice ipc improvement.
 
Its clearly a flaw in the processor design.. They could have designated which order cores are to be used in.. currently it's 12,34,56,78 It should have been 1,3,5,7,2,4,6,8.

The benefit that the current system gives is that unused modules are shut down when not in use, so power consumption is reduced. This in turn allows the turbo mode to clock the chip higher. Regardless of these points the performance is reduced.

Whether the OS is patched or the cpu is fixed, the end result should be slightly higher multithreaded performance, at the cost of power consumption, and lower cpu clocks.
 
Bulldozer's design is less than optimal (too little L1 and too much high latency L3 cache, scheduling, not enough floating point units, leaky 32nm process). The floating point capability in particular is why the "8 core" BD is slower than a Thuban in FP heavy workloads.
 
problem is the current generation of Bulldozer is supposed to have very similar instructions per clock to K10.5, it was meant to either match or improve in that aspect, that was one of the fundamental design parameters.


No, it wasn't, this was a SERVER chip first and foremost, designed for throughput, not IPC and also designed as an APU not a cpu on its own. The first gen is a cpu only, the 2nd gen is an APU, we probably won't see only APU's till 28/20nm and when the GPU power is more able to be utilised by windows/linux os's, which is still a little way off. Beyond quicksync/video encoding and some basic stuff there isn't a whole lot of gpu acceleration.

As said so long ago, this is a two integer pipeline wide core, vs a 3 in the old gen. That shows a HUGE efficiency improvement per pipeline in each core, it has 33% less resources and is never that much slower, and normally just about on par, through in 20% more performance from better scheduling and you mostly have 2 integer pipes outperforming 3.

Its clearly a flaw in the processor design.. They could have designated which order cores are to be used in.. currently it's 12,34,56,78 It should have been 1,3,5,7,2,4,6,8.

The benefit that the current system gives is that unused modules are shut down when not in use, so power consumption is reduced. This in turn allows the turbo mode to clock the chip higher. Regardless of these points the performance is reduced.

Whether the OS is patched or the cpu is fixed, the end result should be slightly higher multithreaded performance, at the cost of power consumption, and lower cpu clocks.

IT is and it isn't a design flaw, firstly a scheduler is incredibly fundamental to an OS, patching is NOT something done lightly, and that is why you can already see Windows 8 builds with a new scheduler in it, and no sign of it on Windows 7. There are some things you often don't patch as it can create so many problems.

Secondly, you do NOT every single time want threads to go onto different modules. Firstly Bulldozer's power gating is VERY effective, even overclocked the clock and power gating is immense, if you only had windows background processes, which will be pushing through DOZENS of threads all the time, but using almost no cpu power, in your method, where every new thread got pushed to a new module, Bulldozer would never ever power gate any modules down and its idle power would be horrific, rather than matching that of a core with less than half the transistors.

Secondly, there are situations in which sharing L2 and the same data can improve performance when two threads are within 1 module. Thirdly, if you start off with 2 threads, and put them in different modules, but you get another 3-4 threads from other programs so there is 2 threads in each module, then you would be even more likely to see the sharing of L2 do better for two threads from the same program.

In other words, a scheduler is NOT basic, its in no way cut and dry, it in no way can only ever use a new module for every new threads up to 4, that would make the chip FAR worse than it is now, its not even slightly feasable as an idea and this is why a patch for Win 7 isn't certain, maybe not even that likely(and may not be great if it is done).


There are hundreds of different types of data processing, there are thousands of scenario's, to get a chip working best in all of them is very difficult, however previous chips have been far less complex before Bulldozer.

Not only the different situations I highlighted, power saving, thread combining, the fact that usage can be constantly swapping between 2 and 8 threads and constantly moving one thread from one module to another won't help, you've got, what if you've got 2 heavy integer threads, then another program starts and its got 6 more threads, of which 4 are incredibly heavy on FPU, you'd probably be best off with one FPU heavy thread in each module and the rest spread out as best as possible. There are as I said,THOUSANDS of possibilities. The scheduler for i7 has basically been worked and improved on since Yonah IIRC, or Memron, 1-2 gen's before Conroe. Ath 64 scheduler has been worked on and tweaked for years.

It will take time, that would be true if Bulldozer came out yesterday or in 3 years.
 
I find this quite difficult to believe. These guys were working on this architecture for quite some time. Why didn't they contacted Microsoft in order for them to release a Windows 7 patch before Bulldozer come out. In addition l haven't heard anything from AMD or Microsoft abour releasing a patch that addresses this issue.

Windows 7 is mature, I guess MS ain't so keen to mess around with its kernel at this stage in the game.
 
interesting thing is that Bulldozer is no slower than K10.5, but it is at the same time, when both integer clusters are in use it gives ~80% the performance of two traditional cores, which is pretty impressive considering it uses only ~10% more space than that traditional design. however when some people have manually disabled one integer cluster in each core, it acts more like a quad-core K10.5, more or less equal in clock for clock performance. all mighty impressive considering each integer core has 33% less physical resources than a K10.5 core, yet offers similar or superior performance when it has access to the whole Floating Point Unit, giving the fact Bulldozer has an absolute ton of headroom in the frequency department. fact of the matter is everyone is jumping on the 'Bulldozer sucks' bandwagon far too prematurely, still stand by the belief that anyone who says its a flawed architecture is short sighted at best. ;)
 
interesting thing is that Bulldozer is no slower than K10.5, but it is at the same time, when both integer clusters are in use it gives ~80% the performance of two traditional cores, which is pretty impressive considering it uses only ~10% more space than that traditional design. however when some people have manually disabled one integer cluster in each core, it acts more like a quad-core K10.5, more or less equal in clock for clock performance. all mighty impressive considering each integer core has 33% less physical resources than a K10.5 core, yet offers similar or superior performance when it has access to the whole Floating Point Unit, giving the fact Bulldozer has an absolute ton of headroom in the frequency department. fact of the matter is everyone is jumping on the 'Bulldozer sucks' bandwagon far too prematurely, still stand by the belief that anyone who says its a flawed architecture is short sighted at best. ;)

Cool. You go ahead and buy one and until things improve we'll continue to suggest otherwise to others. :)
 
It seems Trinity is around 240MM2:

http://semiaccurate.com/2012/01/05/exclusive-anyone-curious-about-trinity/

Here is an picture of an actual Trinity die and a demo of Trinity:

http://www.techpowerup.com/158480/AMD-Demonstrates-Trinity-APU-Its-Own-Thunderbolt-Alternative.html

It would be have better if AMD had said what settings the game(DiRT3?) was running at though.Perhaps a picture of what speed the CPU was running at too would have been a good idea.

Edit!!

Here is the original article with a few more details:

http://hothardware.com/News/AMD-Fus...00M-Mobile-GPUs-and-Lightning-Bolt-In-Action/

According to the article in post 1144, a 17W TDP Llano CPU was used in the demo?? Really??
 
Last edited:
Back
Top Bottom