The R&D itself isn't insignificant, but the biggest part is in the architecture, you make a new architecture and one core, thats like 80-90% of the work, scaling it to different die sizes configurations is the smaller part of the work, IE GCN is one architecture, its designed to have DX11 functionality and rops/shaders/mem bus in some ratio x:y:z, you build a core at 350mm^2 , and a smaller version with less bus, less shaders, less rops at 200mm^2, or a bigger one at 450mm^2.
That isn't particularly hard, the reason they don't do one, is yields, 28nm process has further increased cost of production, mostly because each wafer takes longer to make, so the same amount of equipment can now make less wafers in the same amount of time, costs go up.
Yields go down, tiny cores, 10-100mm^2 have stupid high yields, they go down, but not much as they get bigger, then you hit a tipping point where instead of getting 400 dies off a wafer, you get 100, then 50, then 5.
Imagine the wafer/production process costs $8000 per wafer, small dies and you get 500 cores off a perfect wafer the cost is $16 a core, now say 50 defects results in 50 wafers getting canned, thats 8000/450 = $17.8, not much difference.
Now a bigger die you get 250 off a wafer, but you still get 50 dies that don't make it. perfect yield is $32 , actual yield gets you $40 a core.
Now a big die, 100 wafers off a perfect yield, potentially 40 or 50 dies still don't make it, perfect yield is now $80, $160 with actual yields. You go from yields increasing prices 10% at low die size, 25% or so with a medium die, and 100% increase with a huge die.
The problem is thats the cost of manufacturing, you're talking doubling those costs for selling the cores to fund the R&D spend and make a little profit.
The reality is 500mm^2 cores end up needing to be sold for at least $300 to make a profit, with decent yields for a die that size, if you have poor yields(remember 480gtx was somewhere below 20% yields, that would be 8000/20= $400 a core BEFORE marking it up), that yield gets close to 10% and its $800 a core.
Huge dies for professional market are well worth it, and essentially harvested failed dies sold off, is pure profit, but the numbers available should tell you how low yields are with 500mm^2 + cores on 28nm.
If AMD were a bigger player in the professional graphics/compute market(which they should be) then they may have done the same thing. Ultimately look at what 10k cards at $1000 bucks brings in, $10mil.... ultimately for companies with 4bil + turnover a year, Titan isn't going to do anything for Nvidia in revenue/profits, might make a loss, but is the kind of marketing win AMD has never gone after, and has helped Nvidia for years.
Few wanted a 8800 ultra, but everyone knew about it, few will want a Titan, but people will remember it was the fastest single core on 28nm. Difference is AMD have been fighting Intel, and their immense corruption and vast debts for years, AMD can't afford to just throw out $50mil on marketing and R&D to bring in $10mil revenue, while Nvidia can easily afford it.
Ultimately its true, Titan is a nice idea, but when 2x7950's/2x670gtx's will trounce it, cost half as much and has been available for a year... marketing win, but ultimately boring card.
Ultimately this gen doesn't quite fit any good price/die points. The 680gtx is as big as you can possibly go with 256bit bus, more shaders would be wasted, frankly their current shader count is too high as its so bandwidth limited above 1080p. The 7970 has loads of bandwidth, but the 384bit bus uses loads more power, die size and is vastly underutilised on the 7970... you could pack probably 20-30% more shaders/rops/tmu's on there and not be bandwidth limited... but then die size becomes a problem.
Realistically this gen you have 280-300mm^2 is right for 256bit, but you want a 400-450mm2 core for a 384bit bus, but that size core would make the price painful so doesn't quite work. The balance is just off, the 7970 good as it is, burns too much power because of the bus, and has way too much bandwidth. Neither Nvidia nor AMD could find a great 300-400mm2 core this gen because it just didn't work out great.
I would bet that at 20nm a 384bit bus, 350mm2 and a buttload of rops/shaders will work out pretty damn well though, just got to hope the process isn't complete crap.