Gotta love nvidia's marketing headlines, the 1tflop number is a rather specific usecase, and not really comparable with a lot of other given numbers.
Most gpu 'flops' values are given for FP32, with which the X1 gets 512GFlops, however under certain circumstances it's able to package two FP16 ops into a vector operation in the FP32 pipeline (it has no FP16 pipelines, so worst case scenario it uses one FP16 operation in the FP32 pipeline), so under certain conditions it can claim the magic teraflop...
It also bases that off a 1GHz GPU clock, which seems high for a phone/tablet device, even on 20nm...
It is an improvement on K1 (obviously), but ultimately by the time K1 devices were out in the wild the performance wasn't exactly a standout leader, high-end and decent but not staggeringly better than the competition at the time, I see the same happening here...