I was a firm believer in 2Gb over 4Gb as I couldn't see any great benefit but I read a very good Tom's Harware article where it explains that 4Gb on a 64-bit operating system can be slower than 2Gb on a 32-bit OS because of the way 64-bit operating systems store numbers (it takes twice as much space to store a 64-bit number as a 32-bit number). This situation is completely reversed if you go to 8Gb with a 64-bit OS.
My advice would be to do it.
Actually thats not necessarily correct, the Intel X86 processors have 8, 16, 32 and 64bit registers, and they can store their numbers in many different formats depending on the programming. Just because you run the processor in 64bit mode, using a 64bit OS, doesnt suddently mean the programs are forced to run in 64bit registers.
But yes, if a programmer needs 64bit math, and uses 64bit variables then it will take 64 bits to store the number one. A 64bit OS is still able to work with any of the previous data structures natively due to the design of the X86 processor.
BTW, another reason 64bit is theoretically slower, is because the larger storeage requirements put more strain on the available memory bandwidth. Just doubling ram up to 8GB doesnt help, as the memory interface on dual channel DDR remains 128bit wide. (Just think of the lack of performance gains on 1024meg NV 8800GT's.. the 256bit bus is just too much of a limitation for the extra ram on a graphics card)
Nehalem will help a ton, as it combines tripple channel 192bit memory, with the higher bandwidths of DDR3 (Although apparently DDR2 should work if the motherboard makers wish) Conroe's limited by its FSB speed, but if the integrated memory controller on nehalem supports say DDR2@1066, in triple channel thats a big bandwidth increase over the performance of a Conroe with a 1333FSB.
Windows uses an system called WOW which basically creates little isolated pockets for 32bit applications to safely run without the risk of a 64bit application modifying its reserved memory etc. But its still run natively on the Core 2 Processors, its not a software emulation, just a kinda protective buffer.
But even running pure 64bit applications, its still possible for data to be stored as 8/16/32 or 64 bit depending on the programs requirements. If you look at some of the early MMORPGS, they used 8 bit data structures for characters (leading to skill caps of 256), yet they were running on 32bit windows. By using smaller data structures they could run with far smaller memory requirements.
If most of your applications are 32bit, 4GB on 64bit windows will generally outperform a 2GB 32bit windows. If you run a lot of computationally heavy 64bit applications, then it could be slower, due to the amount of data being dealt with.. but then again the 32bit windows couldnt even run those applications without extremely processor intensive schemes to do 64bit math in 32bit chunks, and in that situation the 64bit OS will be a huge amount faster.