Dear lord, power efficiency ALONE, with no performance improvement is meaningless. The entire process and silicon manufacturing business is based around performance gained from improved power efficiency.
I know it's confusing for some people that the same words can be used with different meaning in different situation but it's really pretty simple.
If you have a card that gets a 15000 score in a benchmark and uses 250W, then another card comes along and gets 15000 score in the same benchmark but uses 150W... you get power efficiency but it doesn't improve performance.
If you have a new card that gets a 28000 score and uses 250W, you get increased performance. The only thing that allows this to work is power efficiency. Without it, that score would take 500-600W of power to achieve and neither AMD or Nvidia is willing to make a single high end core that uses that much power.
Every single process that comes through, be it 150nm, 65nm or 14nm, each new node brings with it a roughly 50% power reduction per transistor and twice the density... without BOTH these things together you don't get significantly faster chips. Twice the transistor density with the same power usage would double the power in a given area, with a 250 or 300W limit you couldn't actually get more performance. You'd have the same chip at half the size, useful but ultimately we already have that level of performance.
980 was boring because it brought an existing level of performance for a minimal saving in money from power usage.
On the other end of the scale same transistor density but half the power usage and you're limited by reticle size(largely size they can create a image over due to the light source effectively. So 600mm^2 and very poor yields at that size, doable but not great and expensive at 500mm^2, so halving power could mean theoretically 250W chip with twice the performance but it would need to be 1000mm^2 which is literally not possible.
Double transistor density AND half the power allows you to make a new chip roughly the same size, roughly the same power with roughly double the transistor count. THis is the fundamental basis around the entire chip fabrication industry... but people in this thread think performance per watt is 'new' or a change of direction.
Incidentally this is precisely why 20nm sucked so hard, massive cost increase(due to double patterning, longer process for manufacturing, worse yields, much more expensive and difficult tape out, which is all theoretically fine except it roughly doubled transistor density but massively missed the 50% power reduction target, it was closer to 20%. Finfets barely decrease density but drastically reduces power. 28nm-14/16nm finfets is essentially one 'normal' node with double transistor density and a little over 50% power reduction(best case as always).