a easy way to think about it is like a harddrive, seek time versus continuous reads. if you need to access lots of different data then you have to keep telling the memory where to read, while the memory basically have no seek time the comparison here is the cpu having to keep asking the cpu being the seek time, with a cas latency of 2 you have to wait 2 clocks out of the 400 before it can go fetch it. where as on the ddr2 you'd have to wait say 4 clocks, but its running at 800mhz so its actually waiting the same amount of time, the latency in seconds is the same, just the clock latency is different.
however, just like with hard drives if you know where all the data you want is you can tell the memory to send it a bigger stack of data with less seeking time, as with a continuous read on a hard drive being much faster than multiple smaller reads. with ddr2 the higher speed causes higher bandwidth, but due to the other latencys being higher its not simply double the bandwidth, it has to wait a certain amount of clocks here and there, but its no doubt faster.
am2 brings with it between say 1-10% difference in most areas, probo averaged out at maybe 3%. this is the same kind of difference a system will see if you run cas 2,2,2,5 ddr400 over cas 2.5,3,3,8 memory which a lot of people would upgrade to, but ddr2 isn't something people jump to upgrade. as for speed, am2 is faster, within a couple days of the big benchmark guys getting their Es or retail am2 systems they all had new higher scores to post, maybe not hugely higher but they are still higher.
people seem to constantly say one platform over another benefits or doesn't benefit as much from low latency memory. 2 clocks is 2 clocks on any platform, every platform has to access memory constantly, upgrading any system with slightly lower latency memory gives very similar performance increase on any system. faster is always better, there are differences, with better branch prediction cache can hopefull stay ahead of the cpu thereby bypassing a lot of the latency of accessing memory if it can preload memory before cpu needs it. it doesn't work all the time, the cache still needs to access the memory. conroe's cache, branch prediction is much improved, but its pipeline is a little longer than ath 64's so when it needs to flush it, it would take an extra few clocks to refill it than an ath 64. all that tends to even out though.