Copied from AVforums but I think its a fairly good article if your technically minded.
http://dpad.gotfrag.com/portal/story/35372/?cpage=1
I like the article mainly as its not biased and it's well written.
And for people who have trouble clicking links heres it copy and pasted.
By: Michael Perry - Published November 13, 2006 at 12:38 AM EST
- Writer ArchivePull up a chair, some beverages, even a pillow as we look at DPAD's most comphrehensive article EVER. Michael "Optimus" Perry goes beyond the range of an average review and breaks the PS3 and 360 down piece by piece, giving us a never before seen analysis of arguably the most heated debates in our recent console memory. PS3 or 360?
PS3 vs 360… 360 vs PS3… is a very touchy subject as everyone is aware I’m sure. Before this article gets underway its important that everyone keep in mind that neither of these 2 consoles is weak by any stroke of the imagination. Both are very powerful and significant upgrades over their current generation counterparts such as the PS2 or Xbox and as always it’s about the games. What it TRULY boils down to for the vast majority of people out there is that they don’t care about the specs of a console. All most people care about is “How good are the games available for it?” So I want it to be understood immediately that by no means is this article meant to be some sort of indication of which of the two consoles will end up coming out on top. Nor is this article meant to, in any shape or form, change anyone’s mind about which console they decide to purchase. This is simply an article intended to inform people about both machines and in doing so there will of course be comparisons between the 2 machines, but they will be facts. With all of that said lets get down to business shall we?
PS3 CPU & 360 CPU
Let us start off by just showing what Microsoft and Sony released to the public in regards to the cpus in both their machines. Many press releases in many different formats and or styles, but this is the gist of it.
360 Central processing unit (aka Xenon)
90 nm process, 165 million transistors (65 nm process SOI revision in 2007)
Three symmetrical cores, each one SMT-capable and clocked at 3.2 GHz
One VMX-128 SIMD unit per core, dual threaded.
128×128 register file for each hardware thread, 2 sets per VMX unit
1 MB L2 cache (lockable by the GPU)
Dot product performance: 9.6 billion per second (33.6 billion combined with GPU)
115 GFLOPS theoretical peak performance
ROM storing Microsoft private encrypted keys
360 CPU information provided by Microsoft
PlayStation 3 Central-processing unit (aka Cell Broadband Engine)
PowerPC-base Core @3.2GHz
1 VMX vector unit per core
512KB L2 cache
7 x SPE @3.2GHz
7 x 128b 128 SIMD GPRs
7 x 256KB SRAM for SPE
Dot product performance 22.4 billion (51 billion combined with GPU)
1 of 8 SPEs reserved for redundancy
Total floating point performance: 218 GFLOPS
PS3 CPU information provided by Sony
Now before I get into it I’d like to point out that while both consoles have powerful CPUs both Sony and Microsoft have played a dirty little numbers game with everyone… numbers that can easily be misinterpreted by most people to mean “The one with the highest numbers must be the better of the 2” and that isn’t how it works at all (atleast not all the time and here is the kicker both Sony and Microsoft want you to misinterpret the numbers).
Why isn’t a “higher is better” mentality always a safe bet? Simple really, one has to take into consideration important things like the architecture. To only concentrate on the raw numbers without understanding the specifics of how it operates can lead to mistakes like this example here “Midway has a car that can reach a top speed of 180MPH and Australia has a car that can reach a top speed of 90MPH.” Someone only looking at the raw numbers may assume “This is far too easy clearly Midway is going to win because his car goes up to 180MPH” Now did anyone stop to consider the fact that maybe Australia is not only the better driver of the 2, but his car has quicker acceleration plus better braking and the road they’ll be racing on is dripping wet and packed full of sharp turns which may prevent the more inexperienced driver from banking on all that raw speed?
Not the best of analogies, but this will teach everyone to be cautious when they see either side throwing around their Megahertz, dot products and GFLOPS. I’m not saying the numbers are 100% meaningless as there are numbers that are actually trustworthy, but its getting you all ready for what I’m about to tell you
Dispelling Some of the Hype
Now there are people that look at the 360 having a triple core processor and the PS3 with the much publicized Cell Processor and start to wonder…
#1 How in God’s name can the 360 ship with a 3 core processor in November 2005 while there isn’t an available purchase for 3 or 4 core CPUs for desktop computers?
#2 Why didn’t Intel or AMD manufacture and start selling such a processor at the same time or before the Xbox 360 shipped?
#3 How can the cell have 1 Power PC core and, in addition to that, have 7 SPE, which are basically seven extra processors?
#4 Everyone knows processors aren’t cheap and when you factor in everything else you need, it’s even more expensive. How can Microsoft get away with charging as low as $299 for the Xbox 360? How can Sony get away with charging as low as $500 for the PS3, when the processors themselves cost 90% of the PS3’s price or cost more than $500?
Marketing talk from Microsoft and Sony: The processors inside these machines are extremely powerful and cutting edge you literally have a supercomputer in your home as the Xbox 360 has 1 Teraflop worth of computing power and the PS3 has 2 Teraflops worth of computing power.
TRUTH: Both the 360 and PS3’s CPUs are heavily stripped down compared to what most of us are probably using on our desktop computers to view this article. Both consoles are labeled as 3.2GHZ, but they don’t offer performance comparable to that of a typical Athlon 64 3200+ or better than even an Athlon XP 2800+ CPU. The CPUs inside the Xbox 360 and PS3 are “In-Order Execution” CPUs with narrow execution cores, whereas what we use on our computers are classified as “Out-of-Order Execution” CPUs with wider execution cores.
The reason they can sell for so cheap is because they are not as robust or complex as what we have inside our computers. The execution theme in both the 360 and PS3’s CPUs is similar to that of what you would see in the original Intel Pentium Processor. (Not referring to the Pentium 2 3 or 4, but the original) This is because they’ve stripped out hardware designed to optimize the scheduling of instructions at runtime. As a result, neither the 360 nor PS3’s CPU contain an instruction window. Instead, instructions pass through the processor in the order in which they were fetched; hence both are “In-Order Execution” CPUs.
Marketing talk from Microsoft and Sony: Thanks to these multi-core processors developers will be able to multi-thread their games and get significant performance improvements and achieve Artificial Intelligence in games that people previously thought impossible for a videogame. It’ll be as if you’re playing with another living breathing human being.
TRUTH: “What is the big deal? How exactly does the fact that both processors being “In-Order Execution” CPUs hurt them? Well, see the 3.2GHZ clock speed for both CPUs? The type of nasty game code, full of branches, loops etc… that would’ve been greatly improved speedwise, thanks to out-of-order execution and a wider execution core is not there to help, so that 3.2GHZ actually performs slower than out-of-order execution CPUs available to desktop computer users.
This brings us to the very reason why both the PS3 and Xbox 360 are using multiple processors in an effort to combat the lack of an instruction window and the fact that they have a narrow execution core. It gets even better, because this very same code that they hope to speed up using parallelism on multiple cores isn’t by any means parallel programming friendly.
On the other hand, Graphics-related code is great on both these processors, as graphics code is nice and parallelism friendly. There is a reason people consider graphics accelerators to be the poster child for parallelism. As a matter of fact, it’s the most successful form of parallelism the field of computer science has ever witnessed. GPUs are able to get all transistors firing that actually produce a significant real world benefit to the people using the product.
For the CPU to become more like the GPU is the ultimate goal for many and AMD together with ATI seem to be going for it. The cell processor is actually one such attempt to do so, but it’s not yet at the level everyone had hoped. (Perhaps a bit early as a cell like CPU isn’t on Intel’s to do list until about 2015) Long story short, both Microsoft and Sony have given developers more than enough on the graphics side of things, but at the same time, are asking developers to do more with less on the aspects of the game unrelated to graphics.
bit of review
#1 Both consoles are using in-order execution CPUs that are half the speed of out-of-order execution processors when it comes to running most game code, especially the more troublesome type which contains branches, loops and pointers.
#2 The very code they’re hoping to get improved performance out of isn't the type to lend itself so easily to multi-threading… to say it's hard would be the understatement of the century.
Here is a bit of what John Carmack, technical director of id Software, has to say about this.
“I do somewhat question whether we might have been better off this generation having an out-of-order main processor, rather than splitting it all up into these multi-processor systems.”
“It’s probably a good thing for us to be getting with the program now, the first generation of titles coming out for both platforms will not be anywhere close to taking full advantage of all this extra capability, but maybe by the time the next generation of consoles roll around, the developers will be a little bit more comfortable with all of this and be able to get more benefit out of it.”
But it’s not a problem that I actually think is going to have a solution. I think it’s going to stay hard, I don’t think there’s going to be a silver bullet for parallel programming. There have been a lot of very smart people, researchers and so on, that have been working this problem for 20 years, and it doesn’t really look any more promising than it was before.”
Everyone should be aware that these processors while powerful and a leap over what the current generation consoles had, they aren’t the second coming they were marketed to be and what drives this point home even further is the fact that Multi-threaded programming on these CPUs will definitely not be achieved at the snap of a finger; the developers have their work cut out for them.
How is one CPU better than another?
GFLOPS is something that gets thrown around a lot, but it should be clear that the peak theoretical GFLOP numbers for both these CPUs are:
115GFLOPS Theoretical Peak Performance for 360 CPU
218GFLOPS Theoretical Peak Performance for PS3 CPU.
These CPU theories will not be achieved in real world performance. What IBM did when testing for theoretical peaks on both CPUs can't really be considered as representative of how the processors would actually perform in real world situations, because of the type of testing done is too controlled. It’s a much too perfect of an environment and game development is going to involve an unforgiving environment that doesn’t cater so well to the perfect environment the CPUs were tested under.
The GFLOP numbers for the PS3 were calculated based on 8 running SPE, so the fact that the PS3 uses only 6 SPE for game applications lowers the peak theoretical even further, as majority of the floating point work on the PS3’s CPU is done via the SPE. Each SPE has a peak theoretical of 25.6GFLOPS. So the total peak theoretical performance for all 6 SPE would be 153.6GFLOPS, but why is that number also not achievable?
In IBM’s controlled testing environment, their optimized code on 8 SPE only yielded a performance number of 155.5GFLOPS. If it took 8 SPE to achieve that, no way 6 will be able to and that testing was done in a fashion that didn’t model all the complexities of DMA and the memory system. Using a 1Kx1K matrix and 8 SPE they were able to achieve 73.4GFLOPS, but the PS3 uses 6 SPE for games and these tests were done in controlled environments. So going on this information, even 73.4GFLOPS is seemingly out of reach, showing us that Sony didn’t necessarily lie about the cell’s performance as they made clear the 218GFLOPS was “theoretical.” But just like Microsoft they definitely wanted you to misinterpret these numbers into believing they were achievable.
Even while taking all of this into consideration, the CPUs can’t reach those crazy performance numbers; the PS3’s cell still comfortably comes out on top in terms of overall floating point capability, but it should be known that the available power on the PS3’s cell will be significantly more difficult to harness than the available power on the 360’s CPU.
It’s also worth mentioning that even the PS2 CPU had more than twice the GFLOPS of the original Xbox’s CPU, but it didn’t necessarily lead it to being the performance winner. This time around, while the cell has the GFLOPS advantage, its advantage isn’t quite as big as the PS2 CPU had on the Xbox. This teaches us that there is more than one meter of real world performance.
The PS3’s cell processor has 1 Power PC core similar to that of the 3 Power PC cores sustaining the 360’s 3 core design (without the vmx-128 enhancements available on each of the 360’s cores) and 7 SPE (synergistic processing element). The 8th is disabled to improve yields. One of the SPE is used to run the PS3’s operating system while the other 6 are available for games. The reason the PS3’s CPU will be significantly more difficult to program for is because the CPU is asymmetric, unlike the 360’s CPU. Because of the PS3 CPU only having 1 PPE compared to the 360’s 3, all game control, scripting, AI and other branch intensive code will need to be crammed into two threads which share a very narrow execution core and no instruction window. The cell’s SPE will be unable to help out here as they are not as robust; hence, not fit for accelerating things such as AI, as it’s fairly branch intensive and the SPE lacks branch prediction capability entirely.
I’m sure people remember from the section detailing how the 360 and PS3’s processors are less robust compared to processors we use on our desktop computers and the consequences of being in order execution. Well the PS3’s SPE are further stripped down than even the Power PC Cores and, as a result, isn’t as capable of handling as many different types of code like the 1 Power PC Core available on the PS3’s cell or the 3 Power PC Cores available on the 360’s CPU. The problem with being asymmetric is when you program for the Power PC Core on the PS3 CPU, the method of programming you used to get the most out of that Power PC core is no longer effective when breaking off tasks for the SPE to work on. Going from the PPE to the SPE on the PS3 requires a different compiler and a different set of tools.
When you come to the realization that the key to making up for the CPU is in-order execution is the rather complicated parallel programming, you realize that the CPU being asymmetric and having just a single PPE makes something that was already extremely difficult even more difficult. So a developer’s job is harder when you factor in that the PS3 has a 512KB L2 cache which is half the size of the 360 CPU’s 1MB L2 cache… that single PPE the PS3 CPU has isn’t receiving much help with branches in the cache department.
Microsoft made a better decision from the perspective of the developer; it's still difficult, but much easier compared to working with the Cell architecture. The 360’s CPU isn’t asymmetric like the PS3’s cell and has 3 PPE as opposed to 1, but all 3 are robust enough to help handle the type of code only the PS3’s single PPE is capable of handling. When Microsoft says they have three times the general purpose processing power this is what they mean. Based on the simple fact that the 360 has 3 Power PC cores to the PS3’s 1, more processing power can be dedicated to helping with things such as game control AI, scripting and other types of branch intensive code.
From the perspective of a developer the 360’s CPU’s biggest advantage is that all 3 of the 360’s cores are identical, all run from the same memory pool and they’re synchronized, in addition to being cache coherent. You can just create an extra thread right in your program and have it do some work. This allows the developer to create very nice structures so if you know how to get the best possible performance out of one core you know how to get the best possible performance out of all 3 because they operate in perfect synch.
Each core on the 360’s processor is capable of performing 2 threads each (Think of it as similar to hyper threading), so the 360’s CPU is capable of handling 6 simultaneous running threads at once. This brings me to a very important advantage for the PS3’s Cell CPU, its concurrency. While the 360 CPU may be able to handle 6 processor threads simultaneously it still only has 3 physical CPU cores so every 2 threads must share processing power on a single core. Whereas with the PS3, it has 1 PPE and 6 SPE for games, which are like extra physical processors). If each of the PS3’s 6 SPE used for games are working on a specific task such as collision, cloth physics, animation, water surface simulation or particles, they wouldn’t need to worry about processing power being taken away from another part of the game because the SPE don’t share processing power.
The only cause for concern would be the 512KB L2 cache being shared by 7 simultaneous running SPE and a PPE, but that’s what developers are for; they work around things like this. In practice, this should allow PS3 games to potentially have more things going on at once than 360 games. Ignoring the difficulties of programming for the PS3 CPU, it should be known that the PS3’s CPU is very good when it comes to vertex-related operations because the PS3’s CPU handles graphics code better than the 360’s CPU. It is also possible that through good parallelism of physics code on the SPE that physics code could also run better on the PS3 CPU due to the concurrency advantage.
The 360 CPU however, due to its 3 symmetric General Purpose Cores, is not only much easier to program for than the cell, but having 3 PPE capable of handling things such as AI also means the 360’s CPU will be the better of the 2 CPUs when it comes to AI code. Either way we can look forward to great things from both CPUs in the future.
Before I end off, I’d like to point out a game that in my opinion, from a technical standpoint, is one the most brilliant uses of the PS3’s CPU. All things considered, such as in-order execution and the other complications of the architecture, Heavenly Sword is quite the standout in nearly every regard: incredible combat animations, awesome group enemy AI, and great physics. At the very least this is what I gathered from seeing videos of the E3 demo; it’s a reminder that regardless of the challenges, there are developers that are up to the challenge and its only going to get better with time.
http://dpad.gotfrag.com/portal/story/35372/?cpage=1
I like the article mainly as its not biased and it's well written.
And for people who have trouble clicking links heres it copy and pasted.
By: Michael Perry - Published November 13, 2006 at 12:38 AM EST
- Writer ArchivePull up a chair, some beverages, even a pillow as we look at DPAD's most comphrehensive article EVER. Michael "Optimus" Perry goes beyond the range of an average review and breaks the PS3 and 360 down piece by piece, giving us a never before seen analysis of arguably the most heated debates in our recent console memory. PS3 or 360?
PS3 vs 360… 360 vs PS3… is a very touchy subject as everyone is aware I’m sure. Before this article gets underway its important that everyone keep in mind that neither of these 2 consoles is weak by any stroke of the imagination. Both are very powerful and significant upgrades over their current generation counterparts such as the PS2 or Xbox and as always it’s about the games. What it TRULY boils down to for the vast majority of people out there is that they don’t care about the specs of a console. All most people care about is “How good are the games available for it?” So I want it to be understood immediately that by no means is this article meant to be some sort of indication of which of the two consoles will end up coming out on top. Nor is this article meant to, in any shape or form, change anyone’s mind about which console they decide to purchase. This is simply an article intended to inform people about both machines and in doing so there will of course be comparisons between the 2 machines, but they will be facts. With all of that said lets get down to business shall we?
PS3 CPU & 360 CPU
Let us start off by just showing what Microsoft and Sony released to the public in regards to the cpus in both their machines. Many press releases in many different formats and or styles, but this is the gist of it.
360 Central processing unit (aka Xenon)
90 nm process, 165 million transistors (65 nm process SOI revision in 2007)
Three symmetrical cores, each one SMT-capable and clocked at 3.2 GHz
One VMX-128 SIMD unit per core, dual threaded.
128×128 register file for each hardware thread, 2 sets per VMX unit
1 MB L2 cache (lockable by the GPU)
Dot product performance: 9.6 billion per second (33.6 billion combined with GPU)
115 GFLOPS theoretical peak performance
ROM storing Microsoft private encrypted keys
360 CPU information provided by Microsoft
PlayStation 3 Central-processing unit (aka Cell Broadband Engine)
PowerPC-base Core @3.2GHz
1 VMX vector unit per core
512KB L2 cache
7 x SPE @3.2GHz
7 x 128b 128 SIMD GPRs
7 x 256KB SRAM for SPE
Dot product performance 22.4 billion (51 billion combined with GPU)
1 of 8 SPEs reserved for redundancy
Total floating point performance: 218 GFLOPS
PS3 CPU information provided by Sony
Now before I get into it I’d like to point out that while both consoles have powerful CPUs both Sony and Microsoft have played a dirty little numbers game with everyone… numbers that can easily be misinterpreted by most people to mean “The one with the highest numbers must be the better of the 2” and that isn’t how it works at all (atleast not all the time and here is the kicker both Sony and Microsoft want you to misinterpret the numbers).
Why isn’t a “higher is better” mentality always a safe bet? Simple really, one has to take into consideration important things like the architecture. To only concentrate on the raw numbers without understanding the specifics of how it operates can lead to mistakes like this example here “Midway has a car that can reach a top speed of 180MPH and Australia has a car that can reach a top speed of 90MPH.” Someone only looking at the raw numbers may assume “This is far too easy clearly Midway is going to win because his car goes up to 180MPH” Now did anyone stop to consider the fact that maybe Australia is not only the better driver of the 2, but his car has quicker acceleration plus better braking and the road they’ll be racing on is dripping wet and packed full of sharp turns which may prevent the more inexperienced driver from banking on all that raw speed?
Not the best of analogies, but this will teach everyone to be cautious when they see either side throwing around their Megahertz, dot products and GFLOPS. I’m not saying the numbers are 100% meaningless as there are numbers that are actually trustworthy, but its getting you all ready for what I’m about to tell you
Dispelling Some of the Hype
Now there are people that look at the 360 having a triple core processor and the PS3 with the much publicized Cell Processor and start to wonder…
#1 How in God’s name can the 360 ship with a 3 core processor in November 2005 while there isn’t an available purchase for 3 or 4 core CPUs for desktop computers?
#2 Why didn’t Intel or AMD manufacture and start selling such a processor at the same time or before the Xbox 360 shipped?
#3 How can the cell have 1 Power PC core and, in addition to that, have 7 SPE, which are basically seven extra processors?
#4 Everyone knows processors aren’t cheap and when you factor in everything else you need, it’s even more expensive. How can Microsoft get away with charging as low as $299 for the Xbox 360? How can Sony get away with charging as low as $500 for the PS3, when the processors themselves cost 90% of the PS3’s price or cost more than $500?
Marketing talk from Microsoft and Sony: The processors inside these machines are extremely powerful and cutting edge you literally have a supercomputer in your home as the Xbox 360 has 1 Teraflop worth of computing power and the PS3 has 2 Teraflops worth of computing power.
TRUTH: Both the 360 and PS3’s CPUs are heavily stripped down compared to what most of us are probably using on our desktop computers to view this article. Both consoles are labeled as 3.2GHZ, but they don’t offer performance comparable to that of a typical Athlon 64 3200+ or better than even an Athlon XP 2800+ CPU. The CPUs inside the Xbox 360 and PS3 are “In-Order Execution” CPUs with narrow execution cores, whereas what we use on our computers are classified as “Out-of-Order Execution” CPUs with wider execution cores.
The reason they can sell for so cheap is because they are not as robust or complex as what we have inside our computers. The execution theme in both the 360 and PS3’s CPUs is similar to that of what you would see in the original Intel Pentium Processor. (Not referring to the Pentium 2 3 or 4, but the original) This is because they’ve stripped out hardware designed to optimize the scheduling of instructions at runtime. As a result, neither the 360 nor PS3’s CPU contain an instruction window. Instead, instructions pass through the processor in the order in which they were fetched; hence both are “In-Order Execution” CPUs.
Marketing talk from Microsoft and Sony: Thanks to these multi-core processors developers will be able to multi-thread their games and get significant performance improvements and achieve Artificial Intelligence in games that people previously thought impossible for a videogame. It’ll be as if you’re playing with another living breathing human being.
TRUTH: “What is the big deal? How exactly does the fact that both processors being “In-Order Execution” CPUs hurt them? Well, see the 3.2GHZ clock speed for both CPUs? The type of nasty game code, full of branches, loops etc… that would’ve been greatly improved speedwise, thanks to out-of-order execution and a wider execution core is not there to help, so that 3.2GHZ actually performs slower than out-of-order execution CPUs available to desktop computer users.
This brings us to the very reason why both the PS3 and Xbox 360 are using multiple processors in an effort to combat the lack of an instruction window and the fact that they have a narrow execution core. It gets even better, because this very same code that they hope to speed up using parallelism on multiple cores isn’t by any means parallel programming friendly.
On the other hand, Graphics-related code is great on both these processors, as graphics code is nice and parallelism friendly. There is a reason people consider graphics accelerators to be the poster child for parallelism. As a matter of fact, it’s the most successful form of parallelism the field of computer science has ever witnessed. GPUs are able to get all transistors firing that actually produce a significant real world benefit to the people using the product.
For the CPU to become more like the GPU is the ultimate goal for many and AMD together with ATI seem to be going for it. The cell processor is actually one such attempt to do so, but it’s not yet at the level everyone had hoped. (Perhaps a bit early as a cell like CPU isn’t on Intel’s to do list until about 2015) Long story short, both Microsoft and Sony have given developers more than enough on the graphics side of things, but at the same time, are asking developers to do more with less on the aspects of the game unrelated to graphics.
bit of review
#1 Both consoles are using in-order execution CPUs that are half the speed of out-of-order execution processors when it comes to running most game code, especially the more troublesome type which contains branches, loops and pointers.
#2 The very code they’re hoping to get improved performance out of isn't the type to lend itself so easily to multi-threading… to say it's hard would be the understatement of the century.
Here is a bit of what John Carmack, technical director of id Software, has to say about this.
“I do somewhat question whether we might have been better off this generation having an out-of-order main processor, rather than splitting it all up into these multi-processor systems.”
“It’s probably a good thing for us to be getting with the program now, the first generation of titles coming out for both platforms will not be anywhere close to taking full advantage of all this extra capability, but maybe by the time the next generation of consoles roll around, the developers will be a little bit more comfortable with all of this and be able to get more benefit out of it.”
But it’s not a problem that I actually think is going to have a solution. I think it’s going to stay hard, I don’t think there’s going to be a silver bullet for parallel programming. There have been a lot of very smart people, researchers and so on, that have been working this problem for 20 years, and it doesn’t really look any more promising than it was before.”
Everyone should be aware that these processors while powerful and a leap over what the current generation consoles had, they aren’t the second coming they were marketed to be and what drives this point home even further is the fact that Multi-threaded programming on these CPUs will definitely not be achieved at the snap of a finger; the developers have their work cut out for them.
How is one CPU better than another?
GFLOPS is something that gets thrown around a lot, but it should be clear that the peak theoretical GFLOP numbers for both these CPUs are:
115GFLOPS Theoretical Peak Performance for 360 CPU
218GFLOPS Theoretical Peak Performance for PS3 CPU.
These CPU theories will not be achieved in real world performance. What IBM did when testing for theoretical peaks on both CPUs can't really be considered as representative of how the processors would actually perform in real world situations, because of the type of testing done is too controlled. It’s a much too perfect of an environment and game development is going to involve an unforgiving environment that doesn’t cater so well to the perfect environment the CPUs were tested under.
The GFLOP numbers for the PS3 were calculated based on 8 running SPE, so the fact that the PS3 uses only 6 SPE for game applications lowers the peak theoretical even further, as majority of the floating point work on the PS3’s CPU is done via the SPE. Each SPE has a peak theoretical of 25.6GFLOPS. So the total peak theoretical performance for all 6 SPE would be 153.6GFLOPS, but why is that number also not achievable?
In IBM’s controlled testing environment, their optimized code on 8 SPE only yielded a performance number of 155.5GFLOPS. If it took 8 SPE to achieve that, no way 6 will be able to and that testing was done in a fashion that didn’t model all the complexities of DMA and the memory system. Using a 1Kx1K matrix and 8 SPE they were able to achieve 73.4GFLOPS, but the PS3 uses 6 SPE for games and these tests were done in controlled environments. So going on this information, even 73.4GFLOPS is seemingly out of reach, showing us that Sony didn’t necessarily lie about the cell’s performance as they made clear the 218GFLOPS was “theoretical.” But just like Microsoft they definitely wanted you to misinterpret these numbers into believing they were achievable.
Even while taking all of this into consideration, the CPUs can’t reach those crazy performance numbers; the PS3’s cell still comfortably comes out on top in terms of overall floating point capability, but it should be known that the available power on the PS3’s cell will be significantly more difficult to harness than the available power on the 360’s CPU.
It’s also worth mentioning that even the PS2 CPU had more than twice the GFLOPS of the original Xbox’s CPU, but it didn’t necessarily lead it to being the performance winner. This time around, while the cell has the GFLOPS advantage, its advantage isn’t quite as big as the PS2 CPU had on the Xbox. This teaches us that there is more than one meter of real world performance.
The PS3’s cell processor has 1 Power PC core similar to that of the 3 Power PC cores sustaining the 360’s 3 core design (without the vmx-128 enhancements available on each of the 360’s cores) and 7 SPE (synergistic processing element). The 8th is disabled to improve yields. One of the SPE is used to run the PS3’s operating system while the other 6 are available for games. The reason the PS3’s CPU will be significantly more difficult to program for is because the CPU is asymmetric, unlike the 360’s CPU. Because of the PS3 CPU only having 1 PPE compared to the 360’s 3, all game control, scripting, AI and other branch intensive code will need to be crammed into two threads which share a very narrow execution core and no instruction window. The cell’s SPE will be unable to help out here as they are not as robust; hence, not fit for accelerating things such as AI, as it’s fairly branch intensive and the SPE lacks branch prediction capability entirely.
I’m sure people remember from the section detailing how the 360 and PS3’s processors are less robust compared to processors we use on our desktop computers and the consequences of being in order execution. Well the PS3’s SPE are further stripped down than even the Power PC Cores and, as a result, isn’t as capable of handling as many different types of code like the 1 Power PC Core available on the PS3’s cell or the 3 Power PC Cores available on the 360’s CPU. The problem with being asymmetric is when you program for the Power PC Core on the PS3 CPU, the method of programming you used to get the most out of that Power PC core is no longer effective when breaking off tasks for the SPE to work on. Going from the PPE to the SPE on the PS3 requires a different compiler and a different set of tools.
When you come to the realization that the key to making up for the CPU is in-order execution is the rather complicated parallel programming, you realize that the CPU being asymmetric and having just a single PPE makes something that was already extremely difficult even more difficult. So a developer’s job is harder when you factor in that the PS3 has a 512KB L2 cache which is half the size of the 360 CPU’s 1MB L2 cache… that single PPE the PS3 CPU has isn’t receiving much help with branches in the cache department.
Microsoft made a better decision from the perspective of the developer; it's still difficult, but much easier compared to working with the Cell architecture. The 360’s CPU isn’t asymmetric like the PS3’s cell and has 3 PPE as opposed to 1, but all 3 are robust enough to help handle the type of code only the PS3’s single PPE is capable of handling. When Microsoft says they have three times the general purpose processing power this is what they mean. Based on the simple fact that the 360 has 3 Power PC cores to the PS3’s 1, more processing power can be dedicated to helping with things such as game control AI, scripting and other types of branch intensive code.
From the perspective of a developer the 360’s CPU’s biggest advantage is that all 3 of the 360’s cores are identical, all run from the same memory pool and they’re synchronized, in addition to being cache coherent. You can just create an extra thread right in your program and have it do some work. This allows the developer to create very nice structures so if you know how to get the best possible performance out of one core you know how to get the best possible performance out of all 3 because they operate in perfect synch.
Each core on the 360’s processor is capable of performing 2 threads each (Think of it as similar to hyper threading), so the 360’s CPU is capable of handling 6 simultaneous running threads at once. This brings me to a very important advantage for the PS3’s Cell CPU, its concurrency. While the 360 CPU may be able to handle 6 processor threads simultaneously it still only has 3 physical CPU cores so every 2 threads must share processing power on a single core. Whereas with the PS3, it has 1 PPE and 6 SPE for games, which are like extra physical processors). If each of the PS3’s 6 SPE used for games are working on a specific task such as collision, cloth physics, animation, water surface simulation or particles, they wouldn’t need to worry about processing power being taken away from another part of the game because the SPE don’t share processing power.
The only cause for concern would be the 512KB L2 cache being shared by 7 simultaneous running SPE and a PPE, but that’s what developers are for; they work around things like this. In practice, this should allow PS3 games to potentially have more things going on at once than 360 games. Ignoring the difficulties of programming for the PS3 CPU, it should be known that the PS3’s CPU is very good when it comes to vertex-related operations because the PS3’s CPU handles graphics code better than the 360’s CPU. It is also possible that through good parallelism of physics code on the SPE that physics code could also run better on the PS3 CPU due to the concurrency advantage.
The 360 CPU however, due to its 3 symmetric General Purpose Cores, is not only much easier to program for than the cell, but having 3 PPE capable of handling things such as AI also means the 360’s CPU will be the better of the 2 CPUs when it comes to AI code. Either way we can look forward to great things from both CPUs in the future.
Before I end off, I’d like to point out a game that in my opinion, from a technical standpoint, is one the most brilliant uses of the PS3’s CPU. All things considered, such as in-order execution and the other complications of the architecture, Heavenly Sword is quite the standout in nearly every regard: incredible combat animations, awesome group enemy AI, and great physics. At the very least this is what I gathered from seeing videos of the E3 demo; it’s a reminder that regardless of the challenges, there are developers that are up to the challenge and its only going to get better with time.