Just so everyone remembers who good I am, I'm the one thats been pointing out the texture compression issue. helmutcheese you are completely and utterly wrong.
AS i pointed out while making that point to everyone, this has been for several years. Their x1900xt 256 beating a 8800gtx 320mb in high res games, or the 512mb version beating the 8800gts 640mb version in high res CLEARLY and 100% obviously showing this to be true, even in lots of TWIMTBP games. Nvidia just clearly do not have close to the edge in texture compression and memory management.
I remember even multiple threads a long while back about 8800gtx users constantly having to atl-tab out to windows to clear the memory cache as games grind to a halt, and being fine after alt tabbing out, it was a very common issue in lots of games. Early on Lotro i had the issue with my 8800gtx. Though the x1900xt's of any memory simply couldn't touch the bus of the 8800gtx, or the sheer power of the shader number. THe reduced shader and bus and memory of the gts's made this gulf in memory capability obvious and I've been trying, in vein apparently till the last week or two, to let everyone know so people don't waste money on a 1GB card to become future proof.
Also worth pointing out is 1920x1200 is a SET RESOLUTION. texture size won't change massively, except in games that keep too many unused textures in memory. The biggest reason we've needed more memory on cards ISN'T increasing game quality/complexity and memory requirements, its that with every generation we've slowly increased the most used resolution and aa/af settings which increase the base amount of memory you need. The reason cards were fine with 128mb 5 years ago is, most people gamed at 1024x768, or maybe 1280x1024, with a very few of us gaming at 1600x1200. Very few gamed at higher resolutions or with high aa/af settings we have now.
Just to lay claim to the fame, search my posts, see when i started saying it and when other people started to mention it as a big issue
As to why the 4850/70 with a 256mbit bus can still cope. Well largely, when memory is compressed better, it takes up less bandwidth, very efficient bandwidth also, ring bus(told you it would be useful soon enough) and probably an incredibly efficient hardware compression/uncompression logic on core somewhere. Think of that game thats compressed down to a ridiculous amount of kilobytes, compression can be everything, as long as you have code /hardware that can basically instantly encode/decode it on the fly. With the 4870 its simply a case of pure and unadultorated speed + smaller bus = same bandwidth anyway.
Then back to the ring bus, essentially the idea is, rather than have a single memory loop where everything is passed through one tiny route and everything is held up which is the crossbar style memory controller(i think its called crossbar, i forget), you have a ring bus so data needed on shader 800 doesn't have to go through a path of all the shaders to get that info, it can go directly to the cluster that shader is in, then the shader.
Previous generation there wasn't quite enough power needed for this to make a difference, the new Nvidia cards seem to be suffering from it quite badly. You can look at memory bandwidth, but then you also have to look at how efficiently its using the bandwidth.
trying to think of a good way to explain it i guess. HMm, think of it like an american city with a block type system, very uniform and easy. You have the external bandwidth, 125gb/s or something, imagine that coming over one of those huge bridges to new york, theres only one route there but its full speed. But once you hit the city, rather than all follow the same path, the ringbus essentially lets the information flow along the quickest path to the individual shader that needs that information, so very little of that information is waiting at any one time for the next piece of info.
The old style crossbar method has that same bridge, but instead of a block/grid pattern, theres a single motorway, there is one shader at a time down a massively long motorway and all information must move down that motorway, so the shader at the end has to move through the entire motorway to get where its going. If any of that makes sense. In other words, the external bandwidth from memory to the core itself is only a single part of the information pathway, internal bandwidth and efficiency is a completely different situation, one which the ringbus completely solves.
Its kind of a situation where, the pure speed and simplicity of the single pathway works to a point, and the 2900/3800 series was like really high latency DDR2 when it first came out. Its much faster speed, but such high latencies DDR1 was still much better. But then DDR2 dropped its latency, increased speed and became much better than DDR1, this is essentially the situation. With 320 shaders, in groups of 5 which couldn't all be utilised efficiently the improvement wasn't apparent, it clearly is with 800 shaders of which more are used efficiently.
I've said for a year, it won't be long before Nvidia will adopt a ringbus-type system aswell. Its not far off the Prescott issue of massive pipeline, infact it might be incredibly similar. imagine the last shader on that motorway is just about to get the info, when a frame is dropped and it needs a different piece of information, its basically got an extremely long pipeline to get through before it gets information, where as the ATI ringbus setup essentially acts as a much wider pipeline that an incredibly amount of information can move in other pipes, rather than all stuck in the same one. AT some point Nvidia will have to bring out their "core 2 duo" a much lower latency smaller pipeline internal memory setup. With the old school 16 pipes we had in the x800 cards, and a 6800, it took no time at all to get information anywhere, at 24 it wasn't aproblem at 128 it started to be a massive problem but wasn't to apparent, with 240 and 800 shaders its so obviously a problem that its a joke. There is zero question Nvidia will have to address this in their next major core design update.
AS for the past 5 years, ATi tend to jump the gun on new tech by a generation which tends to kick them in the ass, repeatedly. I mean can anyone remember complaining about the X1800 vs the 7800, lack of shader increase, comparitively massive shader power increase, just to early, as always. Now we look at what every dev wants, the game industry and Nvidia's new cores and see, yet again ATi were on the right path and doing what the dev's want exactly and getting screwed by Nvidia paying the dev's to do something they don't want

But for every generation ATi bring something a little too early, their generation after is an incredibly power beast that brings a pretty excellent value card with it x1900 pro, low pipeline, cheap massive shader power just when games needed it and worked great. XT2900/hd3800 ringbus, insane shader power, touch early, HD 4800 perfect combinations of all the technologies needed.