Memory Basics
In this article, we will examine the physical components of a memory module and the basics of how system memory works.
This article will provide an overview of the physical components of a memory module and explain the principles underlying the functionality of system memory.
Form Factors: From Chips to Modules
Memory used to be available in the form of discreet memory chips that could be inserted into sockets of systems. That was about 20 years ago. With the migration towards higher system memory densities, this practice has been abandoned in favor of modular memory components, starting with the single inline memory module (SIMM) that evolved to the so-called dual inline memory module (DIMM) used in all systems today that utilize SDRAM. The name DIMM originates from the fact that with the introduction of the Pentium processor, the processor bus width increased from 32 bit to 64 bit that required two of the original SIMMs. Therefore, a module that combined two SIMMs into a single format had to be called a DIMM.
The first DIMMs were built according to a variety of different specifications, meaning that there were modules with two clock inputs or four clock inputs as well as a slew of other variations from one module to the next. As a result, compatibility problems were very common in those days. To end this situation, Intel introduced the PC-100 specifications, amongst which was the introduction of an electronic data sheet that was stored in the form of an Electrically Erasable Programmable Read Only Memory (EEPROM) chip: the Serial Presence Detect or SPD. The name originates from the fact that the SPD uses a serial interface to the bus that allows the BIOS to detect the module and enact the proper timings according to the data stored in it.
Each DIMM is composed of three primary components: the PCB (Printed Circuit Board), the SPD (Serial Presence Detect), and the IC (Integrated Circuit).
Front view of a DIMM (simplified schematic drawing). The main ingredients are the green PCB, the memory chips also called "components" or "discretes", the EEPROM containing the SPD and the edge contacts called pins. A standard DDR DIMM has 8 chips per physical bank, each of the chips has a data width of 8 bits for a combined total of 64 bit width of the module. Usually a suffix on the components designates the speed rating of the chips, in this case a -4 would indicate a clock cycle time (tCK) of 4 ns that is equivalent to 250 MHz clock frequency or DDR500. Note the asymmetric position of the key in the "pinout" to ensure that the module can be inserted only in one orientation.
The PCB
The PCB functions like any circuit board. It is composed of multiple sheets of fiber resin with the metal layers making up the traces sandwiched in between. Its function is to provide a mechanical scaffold for the components, as well as to provide power and data connectivity to the latter. As a rule of thumb, all PCBs are built using multiple layers of metal traces separated by the individual sheets of resin. Each layer or plane usually has its own dedicated set of functions, for example, there are input/output planes along with power and ground planes. Most of the time, the ground planes are running closer to the surface to shield the data lines from electromagnetic interference (EMI) originating with other computer components.
The memory chips
The memory chips are semiconductor ICs that consist of a data storage part, the so-called memory array and the logic for addressing and input/output. The silicon is packaged in either TSOP or BGA format, the difference is that TSOP has little legs sticking out on the side whereas BGA has little solder balls that are at the bottom surface of the chip and no longer visible after the chips are mounted on the PCB.
A memory chip consists of the actual silicon die, which contains the array, the interfacing logic and the bonding pads around the periphery. Typically, the bonding pads are in the order of 70 x 70 µm and so-called bond wires are attached to them to connect the die to the leadframe, which is where the pins are anchored. Bondwires are usually 30 µm in diameter, which means that they can barely be seen with the naked eye. To protect the entire assembly, it is packaged in non-conductive plastic (drawn here in transparent blue). Current DDR chips have 64 legs or pins as opposed to the simplified drawing shown above.
All currently used system memory uses Synchronous Dynamic Random Access Memory (SDRAM), a form of volatile memory. The term volatile means that the memory needs power in order to retain the data, if there is a loss of power or else the system is turned off, all data within this memory will be lost. The term Random Access describes the fact that data can be written to any location within the memory rather than having to start at the lowest address and then sequentially filling up the array. The advantage is that coherent data can be written to any area within the memory space that has enough room to contain the entire set of instructions or data, rather than having a few bytes here and a few others there in case an update is done and the originally allocated space no longer suffices.
The SPD
The SPD is a small EEPROM chip (Electrically Erasable Programmed Read Only Memory) that contains the data sheet listing memory timings, memory size, and memory speed that is read by the computers chipset. Most retail motherboard manufacturers allow settings such as memory timings and voltage to be set manually in the computer CMOS Setup Utility. When manual timings are not used, the SPD information is used by the chipset. Most original equipment manufacturers (OEMs) such as Dell, Gateway and others hide the manual settings from the user and, often, do not even read the SPD instructions. Rather, they default to safe settings for maximum compatibility.
Memory Basics
Memory by itself is just another component. Inside a computer system, however, it is integral for the functionality of the system, in fact, every single transaction on the system level will have to use the system memory as an intermediate station. There are subtle differences between different platforms, however, regardless of whether it is an Intel or an AMD system, an e-Machine or a high-end IBM or Itanium server, the basic principles are always the same.
CPU - Memory - Cache
The heart of any PC is the central processing unit or CPU, also referred to as processor in common parlance. The CPU needs data in order to do work. In other words, data are loaded first from the hard disc drive into memory and from there, they are retrieved by the CPU. Since the system memory is outside of the CPU, which means that a certain amount of time is required to access the data, all modern CPUs use a small amount of utra-fast memory, the so-called cache. All data that the CPU anticipates it will need again are written to this cache. In order to optimize data flow, the cache itself is hierarchically organized into a first level (L1) cache, which is extremely small but operates at very low latencies and a second level (L2) cache that is usually much larger but also needs a bit longer to make the data available. In some cases, a third level (L3) cache is present as well but this is the exception rather than the rule (P4EE, HP PA8800).
General schematics of the memory subsystem and how it is implemented in any modern computer. The CPU sends data requests to the memory controller, which in turn, generates the time-muxed (see below) memory addresses to retrieve data from the system memory. The data are being analyzed on the level of the chipset (memory controller) and the CPU itself and data that are determined to be valuable for future use are stored in the on-die, integrated high-speed SRAM memory called CPU cache. The cache runs at CPU clock speed, whereas the system memory runs at bus speed, typically 10-20x slower than a cache. Caches are hierarchically organized into Level1, Level2 and higher for higher speed smaller data chunks and vice versa. Keep in mind that the CPU cannot access any system component directly, whether it is the HDD or the sound, everything has to be written to the memory first.
Non-Multiplexed SRAM Addressing of Caches
One fundamental difference between all caches and the main memory is the method through which the addresses within the array are generated. Caches are generally using an SRAM interface, which means that all addresses can be specified with a single operation. One example would be an Excel spreadsheet where the cell "F34" is needed. "F34" is a composite address, that consists of a row #34 and a column #F. In the case the data are needed, simply the address F34 is sent as a single instruction and the data are retrieved from the corresponding cell.
In the case of system memory, the situation is quite a bit different because a so-called multiplexing address protocol is used. That means, that in the case of a spread sheet, first the row #34 would need to be opened and only then can the column #F be specified. Suffice it to say that this is substantially slower than the non-multiplexed SRAM addressing scheme. Moreover, the SRAM cache runs at the same speed as the CPU whereas the system memory runs at only a fraction of that, typically 1/10 to 1/20 of the cache. Therefore, it is clear that whatever data are needed over and again will have to be fit into one of the cache levels for best system performance.
However, caches are very low density and very expensive to manufacture. Therefore, only a very small amount of data can be held within the different cache levels, everything else has to go into the system memory. It should be clear, therefore, that also the speed of the system memory will have a major impact on the overall system performance, likewise, the access times will be critical.
Accessing System Memory
Accessing system memory involves a rather complicated sequence of events. First, the CPU requests data from where it thinks those data are, that is, from a logical or virtual address space that is created for every application and program running. This virtual address space needs to be translated into the real or physical address space, and this is done mostly by the memory controller - an integral part of the chipset. After the correct address has been determined using the translation cues stored in the CPUs translation lookaside buffers (TLBs), the signals for the addresses have to be generated. The first selection narrows the location of the data down to one side of any memory module by means of the chip-select signal. Afterwards, since we are talking about DRAM, the first signal to be sent from the memory controller to the memory the row address is the Row address by means of a Row Activate Command.
Time- Muxed Row and Column Address Generation and the Three Key Latencies:
tRCD and CAS Delay
Instead of using a hand-shake protocol to acknowledge that the row is ready, synchronous DRAM (SDRAM) specifies a time - after which it is safe to assume that the row is open - as the so-called Row-to-Column Delay (tRCD). That means that after a statistically sufficient time where the tRCD has been satisfied, the row decoders are turned off and the column decoders are turned on by signaling a logical true on the Column Address Strobe (CAS) command line. This allows the same address lines that were used to specify the Row address to now specify the column address by issuing a Read command. This sequence of events and the use of the same channels to perform two different tasks is called time-multiplexing or "time-muxed DRAM addressing". After finding the correct column address and retrieving (prefetching) the data from the memory cells into the output buffers, the data are ready to be released to the bus. This time interval is called the CAS delay or CAS latency.
tRP
As long as the requested data are found within the same row (or page) of memory, the consecutive accesses will be "in page" or so-called "page hits". Any requests of data that are stored outside the currently open row, will miss that page and are therefore called page misses. In that case, the open page will have to be closed and the new page will have to be opened. The sequence of events includes disconnecting of the wordlines writing back all data from the sense amplifiers to the memory cells and finally shorting of the bitlines and bitlines "bar" to put everything back into a virgin state. This process is generally referred to as RAS precharge and the time required to execute all steps involved is called the Precharge latency or tRP.
In order to retrieve the next set of data the appropriate memory row will have to be opened with a bank activate command and the circle completes.
Latency Listings
There is no general consensus on how to list the latency parameters, some vendors are starting with the precharge, others are using tRCD as the first, however, the Solid State Technology Association formerly known as Joint Electron Device Engineering Council (JEDEC) has set forth certain guidelines pertaining to the nomenclature used and the code used on the modules to specify the parameters used. According to these specifications the sequence used is CAS Latency - tRCD - tRP - tRAS, where tRAS is the minimum bank cycle time, that is the time a row needs to be kept open before another request can force it to be closed. Therefore, a module specified as 2-3-2-7 will use a CAS latency of 2, a tRCD of 3, a Precharge delay of 2 and a tRAS of 7.
In general, lower latencies will yield better performance but there are a number of exceptions. Most memory devices will only support latency settings of 2 and higher, however, there have been memory chips capable of running at 1-1-1-2, notable examples were the EMS HSDRAM / ESDRAM series. One important distinction between the CAS Delay and other latencies is that the CL options have to be supported in hardware in the form of pipeline stages, whereas the other latencies are simply "time-to complete" values
Programmable CAS latency means that there are a number of switches as the one shown open in this drawing. The data are released from the memory cell via a pair of bitlines to the sense amplifier (SA), from there, they will either go into a pipeline stage (PS) or else bypass the latter if the switch is closed. In that case, the CAS latency will be lower and the data will reach the I/O buffers earlier. However, this may incur errors at higher frequencies and for that reason, additional buffer or pipeline stages are inserted into the output path that will capture the data and propagate them on the following clock edge.
Refresh
Memory data are stored in the form of electrical charges within the memory cells, that is, extremely small capacitors. The charges are protected from leaking out by tiny switches, the so-called pass-gate transistors. There will still be some leakage of the charge over time and that is why all memory cells have to be refreshed periodically. The easiest way to accomplish this is by reading the data out to the sense amplifiers and then writing them back internally, a process that is called CAS before RAS or CBR. If this refresh does not happen, the data will simply fade and eventually be lost. In order to maintain the data, SDRAM, therefore, needs to execute periodic refreshes that can be triggered even in low power standby mode by means of an integrated refresh counter on the memory chip itself. This feature is the reason why e.g. Suspend To RAM (STR, a power-down mode where CPU and chipset are going into a complete power off state) works without a need to supply power to the memory controller or the CPU.
Direct Memory Access
Aside from the CPU accessing data from the system memory any so-called busmastering device can set up its own direct memory access (DMA) channel to load and store data directly from and to the system memory. In most cases, this will involve a direct connection between the device (e.g. the hard disc drive), the South Bridge and the memory controller integrated into the North Bridge
Once the data are resident in memory, the CPU can access them as well. Keep in mind, though, that if the CPU is the heart of the system, the memory is the soul and no data can bypass it.
Different data paths include the DMA channels, in this case a HDD DMA channel to the memory is shown (red arrows). Essentially, this is the method of how different system components are interacting with each other, and the one link that holds everything together is the system memory.
This concludes our discussion of memory basics and how system memory functions. In the next section of memory basics we will cover the most common types of memory that are currently used in mainstream computers and give an outlook into the future of memory technology
No plagiarism has taken place regarding this article
<
Memory Basics., Accessed 30th September 2007., From: >
http://www.ocztechnology.com/displaypage.php?name=memory_basics&psupp=1