It's baffling why a card with huge bandwidth, lots of stream processors and 25% more GFLOPs than a TitanX is slower. If AMD drivers are not optimized then they should really be re-thinking their whole base code. They need a total rewrite for DX11 and the upcoming DX12.
The only thing that could explain the slower performace from a hardware standpoint is the lower number of ROPs.
The re are lots of potential reasons.
Bandwidth is not a bottle neck with current cards, so throwing more bandied the, doesn't help much accept at 4K, and even there it is not the biggest issue. When you overlock memory on a GPU you don't see big gains for this reason. So having HBM so t really make a big difference, yet.
More stream processors only helps pixel shader limit games, most games simply aren't limited only by pixel shaders. Then there is the issue of efficiently feeding more compute units and keeping them all busy, this gets harder the more units there are. There are still only 4 shader engineers, and they might simply not be capable to fuly feed all the stream cores at the same rate as Hawaii.
Most of the rest of Fiji is also pretty similar, e.g. Rop counts, geometry engineers.
Fiji does have the Tonga improvements for compression and tweaked Tessellation but these are smaller gains than maxwell v1 vs v2.
Gflops simply don't predict graphics performance accurately, too many other factors involved.
Fiji is performing about where I expect it would based on the differences Hawaii.