Who cares about stability in programs no one actually uses.
Run super pi at stock, and then again at max overclock, if it doesn't improve in speed, the overclock isn't working. Super pi, whatever you want really. Gaming benchmarks aren't necessarily going to show a real difference.
Also you'll want to watch cpu'z if possible as you're running something, see if it peaks at max overclock and then quickly drops to a lower clock to keep TDP in check. I know bulldozer does this but Llano isn't quite as up to date on power features so it might not
But yeah, TDP's are a joke, not really sure why we're limited at circa 100-125W on desktop, hell desktop barely needs a limit, as long as idle power is in check and you can buy low power versions if you really want them, its daft, performance over TDP thanks.
Sure a Bulldozer uses silly power overclocked, but so does a 480gtx, a 2600k uses a good 100-150W over the TDP limit when overclocked fully as well. I'm sure a Llano's tdp at a decent overclock is pretty mental.
Anyway the whole and only reason Llano is so low clocked to start off with, is TDP and having to be within some arbitrary limit that matters to no one.