Need 8 cores in your phone? Samsung has you covered.

You don't get 8 cores though really do you, the user still only sees 4, it's just a case of switching between power hungry mode or battery life mode.
 
Am I the only one who doesn't get big.LITTLE? I mean, if the A7s are only for low power stuff, why do we need four of them? Unless I'm missing something, what Nvidia's doing with 4+1 seems to make more sense.
 

http://www.arm.com/products/processors/technologies/biglittleprocessing.php

Basically on light loads the smaller core is active while the bigger core is fully power gated and disabled (greatly saving power). When there's significant load it swaps to the bigger core. Hence they act as one core.

I... don't think I like the idea of quad A15s, seeing how hot my Nexus 10 gets under stress with it's dual A15s :p. And I don't really see the point of having 4 A7s either, surely two would be fine?
 
http://www.arm.com/products/processors/technologies/biglittleprocessing.php

Basically on light loads the smaller core is active while the bigger core is fully power gated and disabled (greatly saving power). When there's significant load it swaps to the bigger core. Hence they act as one core.

I... don't think I like the idea of quad A15s, seeing how hot my Nexus 10 gets under stress with it's dual A15s :p. And I don't really see the point of having 4 A7s either, surely two would be fine?

I know, The :confused: was to signify that I wasn't sure why the other poster was re-iterating what I had already said.
 
Am I the only one who doesn't get big.LITTLE? I mean, if the A7s are only for low power stuff, why do we need four of them? Unless I'm missing something, what Nvidia's doing with 4+1 seems to make more sense.

big.LITTLE effectively allows 2 different modes of power saving, one is similar to nvidia's solution (but better, imo), the other is ultimately the 'true' implementation of big.LITTLE...

So with nvidia first thing to note is that with the Tegra 3 and, it seems, with the Tegra 4 they've done a more hardware based solution. The 'companion core' is the same architecture as the quad core block, but built with a more power-oriented (rather than speed/performance) lithography process. They then, mostly through hardware, either have all 4 cores running or just the +1 core, this is basically just a bodge because they haven't implemented per-core clock and power gating (where they could just disable 3 cores and have a single one running).

The first implementation of big.LITTLE is similar to the nvidia method but done mostly in software, because of the software bias it's made easier by having the same number of cores in the power hungry side as the efficient side, then the threads/states can be easily transitioned. Note that because Samsung/Qualcomm can actually design decent SoC's it's likely both sets of cores will be power/clock gated per core, so the performance should be able to go from a single A7 at low-ish clock speed to a quad A15 at a high clock speed. But this is only the start of big.LITTLE, by far the easiest to implement but potentially not the most useful as it's one or the other.

So, finally, onto the proper implementation of big.LITTLE, the ability to run different core architectures at the same time. So take a scenario where the system is running with 1 or 2 threads of a game that needs high performance, and many other of the usual threads that are basically idle/unimportant. With the nvidia solution here we'd have 4 full power cores running at the max clock speed sapping battery life. With the current processors and the other big.LITTLE implementation we'd have 3 or 4 high power cores, 2 running maxed out and the other 1 or 2 at whatever speed was necessary, not as bad as nvidia but we can do better. With this implementation we could have 2 A15's running maxed out and then 1 or 2 (or even 3 or 4) A7's running much more efficiently to handle all the unimportant stuff, less heat, less power, better.

The first/easiest implementation of big.LITTLE is managed by software but ultimately can be done with a generic device driver (check power draw/temp/load, swap), relatively easy to plumb into current systems. The hard bit is the better/true implementation of it, namely that to move threads around like that needs the scheduler to have much more knowledge about the system it's running on, this is the bit that is currently unimplemented in the linux kernel last time I checked and even when it does come out will take some tweaking to make full use of the benefits.

For reference, and according to ARM, in some of the theoretical benchmarks the A15 is ~2x the performance of the A7 clock-for-clock, but the A7 is ~3.5x more power efficient. This is one of the reasons why it's quite comfortably superior to the nvidia setup.
 
Last edited:
Ah, thanks for clearing that up. I didn't realise there were plans for more advanced per-core power saving in the future.
 
Yeah, it's really quite flexible and neat, I also did a bit more reading and ARM do have a patchset for the kernel that does the 'proper' implementation (they call it big.LITTLE MP)

Also saw one 'benchmark' that baselined performance/power with 1.0 using the A7 alone, the A15's alone was 2.5 performance but ~4 power usage, the big.LITTLE MP implementation was (iirc) 2.5 performance and 1.97 power. They also mentioned this was without a GPU so with that offloading it could be even better :)
 
I'm not sure we'll see much real world improvement to battery life. The lowest operating power point of a A15 is not far off the A7 and idle power consumption is dominated by fabrication process.

Last time I checked idle (deep sleep or core off) and high load (maximum clocks) took up ~95% CPU utilisation.
That means the A7s would only be used ~5% of the time, a tiny amount of total device usage.

Not that I'm complaining :cool:

Also saw one 'benchmark' that baselined performance/power with 1.0 using the A7 alone, the A15's alone was 2.5 performance but ~4 power usage, the big.LITTLE MP implementation was (iirc) 2.5 performance and 1.97 power. They also mentioned this was without a GPU so with that offloading it could be even better :)

They said 70% of that improvement came from background music decoding, a full SoC would have other IP blocks for that?
 
I read it as the improvement for background tasks specifically was 70%, rather than as an overall figure, not entirely sure now though.

Yeah an SoC would have blocks that would be even more efficient for any music/video decoding so there is a slant there in the tests, but also their system had no power/clock gating per-core, just on the blocks of cores, so there's an improvement there to be gained over their figures if that's implemented.

It is early days, but I think it's definitely the way to go, along with stuff like panel self refresh for powering down the GPU.

Have you got any source/more info for the idle/max-clocks utilisation, I can see that being the case for some usages but I would've thought chips that can alter the power/clock per core would quite often have one or more that aren't running completely maxed out (with the exception of benchmarking how many tasks can max out 4 Krait cores for example)?
 
big.LITTLE effectively allows 2 different modes of power saving, one is similar to nvidia's solution (but better, imo), the other is ultimately the 'true' implementation of big.LITTLE...

<etc>

Most informative post of 2013 so far goes to SKILL. Thanks :)
 
Have you got any source/more info for the idle/max-clocks utilisation, I can see that being the case for some usages but I would've thought chips that can alter the power/clock per core would quite often have one or more that aren't running completely maxed out (with the exception of benchmarking how many tasks can max out 4 Krait cores for example)?

Install CPU spy and check out your usage stats :) (might not be completely accurate but it's good enough)

Mine are usually:
~75% at deep sleep (A15 is very frugal at sleep/idle :p)
~15% at the lowest operating power point (I presume this is big.LITTLE's main target. A15 seems to be doing a good job in Exynos 5 though, this is where underclocking pays off too.)
~5% for mid range speeds (A7 should be powerful enough in this state, there should be decent gains without switching to a bigger core.)
~5% at higher clocks (A7 won't have enough performance in this state, ahh the good ol days of overclocking those 1GHz A8/A9 cores! :cool: Two A15s can glug over 3W :eek:)

Hopefully the interconnects do a good job at switching cores under load and balance speed/power. It'll be interesting to see a direct product comparison between Tegra 4 and Exynos 5 octo.

It is early days, but I think it's definitely the way to go, along with stuff like panel self refresh for powering down the GPU.

Yup, the Mali T604+ certainly looks great for idle power consumption and the rise of CGPUs should drive consumption down even further (Tegra 3 certainly needs one :D)
This is a good piece by Vivante about the CGPU used in OMAP 5 > http://www.vivantecorp.com/TICW2.htm
 
Cheers Grrrrr :)

Installed CPU Spy and my results differ from yours quite a lot (SGS2 running the leaked JB rom which may have some battery life/cpu freq issues to be fair):

36% deep sleep
53% 200MHz
4% 500 MHZ, 3% 800MHz, 0% 1000MHZ and 1% 1200MHz

For a total of 97% ¬_¬ :p

Also had a look at the source code for the app (https://github.com/bvalosek/cpuspy)

Not been bothered to setup a dev environment at home but might tinker at work tomorrow, one thing is that it doesn't seem to do is account for multiple cpus', it uses the sysfs to get the 'time_in_states' values but only for cpu0, tempted to tweak/update it for multiple cores, assuming the sysfs values are reported differently per-core...

But, ignoring that for now, for my usage the A7's would seem help a huge amount, as you say at deep sleep there's going to be negligible difference between an A7 and A15, middle-ish speeds could give a nice benefit and higher clocks not much at all.

Would be nice to see per-core, and maybe even 'live' info view/logged/notification? to see if one core is high and the other(s) is low/medium on a regular basis.

With the PSR I've read of some chips which are even lower power than that CGPU (although it's nice to see relatively open documentation on graphics hardware :)) purely to refresh a static image, which should give good power benefits, although probably mostly on Tablet and Laptop workloads than phones
 
My CPY Spy stats agree with sarge:

over the last 31.5hrs...with 4.5hrs screen on time.

Screenshot_2013-01-10-21-49-56_zps80fab6cc.png


Was on charge for the last 2 hrs (200MHz) state? - so would have raised the deep sleep % even more.
 
Last edited:
Yeah, that's what I get (S4 snapdragon) I wonder what the switch off point would be for the A15s? Anything under 1GHz?

Cheers Grrrrr :)

Installed CPU Spy and my results differ from yours quite a lot (SGS2 running the leaked JB rom which may have some battery life/cpu freq issues to be fair):

36% deep sleep
53% 200MHz
4% 500 MHZ, 3% 800MHz, 0% 1000MHZ and 1% 1200MHz

Deep sleep/200MHz could be the same state?

Not been bothered to setup a dev environment at home but might tinker at work tomorrow, one thing is that it doesn't seem to do is account for multiple cpus', it uses the sysfs to get the 'time_in_states' values but only for cpu0, tempted to tweak/update it for multiple cores, assuming the sysfs values are reported differently per-core...

Awesome! My second core never seems to wake up, it'll be interesting to see the whole picture.

Here'a that sunspider run looking at core load/power from Anandtech. I guess the second, rather inactive, core would benefit from a lower power architecture here too. (much less time on full load)
sunspider-krait-multiplevoltages.jpg

http://www.anandtech.com/show/6536/arm-vs-x86-the-real-showdown/2
 
Back
Top Bottom