Suggest some real-world RAID benches

Jimbo Mahoney · 1 May 2006 at 23:33

Hey guys,

I'm buying another 36GB Raptor to have a play with RAID-0. I'll be using the drives as my OS and programs (not games though) drive and testing various stripe and cluster sizes.

My concern with other people's use of RAID 0 is that they set the stripe size too big to really benefit their system (ie 64kB when >90% of their files are <64kB) because they don't understand striping and/or go on what HDTach tells them (we all know HDTach likes big stripes

).

Can anyone think of any real-world benchmarks I could run? Windows boot time is one obviously...

My theory is that my stripe size should be ~ (median file size on the drive/the stripe width) (ie 2 for two drives in RAID 0) and that the cluster size should be ~ (stripe size/2).

I expect when I set the cluster size too small, CPU usage and / or access time will become an issue, offsetting the increased throughput.

And before anyone jumps on the old 'RAID 0 doubles your chances of data loss' - it's cool, because it will be on my gaming rig and I use Ghost images for my OS. If the RAID volume gets screwed for whatever reason, it's no biggie.

Soul Rider · 2 May 2006 at 15:14

Looking to do exactly the same myself, and already have a server running Raid0 on SCSI disks, so I can see the benefit of RAID0.

AFAIK there are not many real world benchies you can do, as most have some sort of other overhead - File transfers limited by source Hard Drive/CD. One thing you do see a real benefit in is HL2, but as you are not using raid on your games drive, there isn't a lot you can test with.

A disk defrag is definitely faster, but again, you cant really measure it unless you make a drive all messy, then image it, but that's too much hassle.

Dutch Guy · 3 May 2006 at 12:39

I don't have any information but also wanted to know.

My understanding regarding stripe size is that if you use it for Windows a smaller stripe size (<64k) is better but if you also have games and storage a larger one might be better (64k/128k)

Soul Rider · 3 May 2006 at 18:24

Dutch Guy said:
My understanding regarding stripe size is that if you use it for Windows a smaller stripe size (<64k) is better but if you also have games and storage a larger one might be better (64k/128k)

Though I am a user of RAID0, my knowledge of it's inner working and benefits is not exceptionally great.

Setting cluster sizes should be as normal for the operation/file system you are using.

Stripe size would be ideal to be set at Median file size/Number of drives in Raid0, but altering the cluster size would just cause you to waste more disk space. You might as well use FAT if you are going to start doing that!!

Cluster sizing should be kept as small as possible to avoid wastage, your cluster will just be split over 2 drives. E.G - a 32kB cluster would be 16KB on each drive, the results of this would not change, but setting your stripe size to 16KB would not be very beneficial because of the increased overheads of writing so many different stripes.

****EDIT*****

Hope this article clears up what I was trying to say.....

We suspect that many of you out there are interested in RAID for its performance advantage. Stripe sizes play a very important role in the performance of RAID arrays and thus it is critical to understand the concept of striping before we delve any further into RAID discussion.

As we mentioned before, stripes are blocks of a single file that are broken into smaller pieces. The stripe size, or the size that the data is broken into, is user definable and can range from 1KB to 1024KB or more. The way it works is when data is passed to the RAID controller, it is divided by the stripe size to create 1 or more blocks. These blocks are then distributed among drives in the array, leaving different pieces on different drives.

Like we discussed before, the information can be written faster because it is as if the hard drive is writing a smaller file, although it is really only writing pieces of a large file. At the same time, reading the data is faster because the blocks of data can be read off of all the drives in the array at the same time, so reading back a large file may only require the reading of two smaller files on two different hard drives at the same time.

There is quite a bit of debate surrounding what stripe size is best. Some claim that the smaller the stripe the better, because this ensures that no matter how small the original data is it will be distributed across the drives. Others claim that larger stripes are better since the drive is not always being taxed to write information.

To understand how a RAID card reacts to different stripe sizes, let's use the most drastic cases as examples. We will assume that there are 2 drives setup in a RAID 0 stripe array that has one of two stripe sizes: a 2KB stripe and a 1024KB stripe. To demonstrate how the stripe sizes influence the reading and writing of data, we will use also use two different data sizes to be written and read: a 4KB file and a 8192KB file.

On the first RAID 0 array with a 2KB stripe size, the array is happy to receive the 4KB file. When the RAID controller receives this data, it is divided into two 2KB blocks. Next, one of the 2KB blocks is written to the first disk in the array and the second 2KB blocks is written to the second disk in the array. This, in theory, divides the work that a single hard drive would have to do in half, since the hard drives in the array only have to write a single 2KB file each.

When reading back, the outcome is just as pretty. If the original 4KB file is needed, both hard drives in the array move to and read a single 2KB block to reconstruct the 4KB file. Since each hard drive works independently and simultaneously, the speed of reading the 4KB file back should be the same as reading a single 2KB file back.

This pretty picture changes into a nightmare when we try to write the 8192KB file. In this case, to write the file, the RAID controller must break it into no less than 4096 blocks, each 2KB in size. From here, the RAID card must pass pairs of the blocks to the drives in the array, wait for the drive to write the information, and then send the next 2KB blocks. This process is repeated 4096 times and the extra time required to perform the breakups, send the information in pieces, and move the drive actuator to various places on the disk all add up to an extreme bottleneck.

Reading the information back is just as painful. To recreate the 8192KB file, the RAID controller must gather information from 4096 places on each drive. Once again, moving the hard drive head to the appropriate position 4096 times is quite time consuming.

Now let's move to the same array with a 1024KB stripe size. When writing a 4KB file, the RAID array in this case does essentially nothing. Since 4 is not divisible by 1024 in a whole number, the RAID controller just takes the 4KB file and passes it to one of the drives on the array. The data is not split, or striped, because of the large stripe size and therefore the performance in this instance should be identical to that of a single drive.

Reading back the file results in the same story. Since the data is only stored on one drive in our array, reading back the information from the array is just like reading back the 4KB file from a single disk.

The RAID 0 array with the 1024KB stripe size does better when it comes to the 8192KB file. Here, the 8192KB file is broken into eight blocks of 1024KB in size. When writing the data, both drives in the array receive 4 blocks of the data meaning that each drive only has the task of writing four 1024KB files. This increase the writing performance of the array, since the drives work together to write a small number of blocks. At the same time reading back the file requires four 1024KB files to be read back from each drive. This holds a distinct advantage over reading back a single 8192KB file.

As you can see, the performance of various stripe sizes differ greatly depending on the situation. Just what stripe size should you use?

Jimbo Mahoney · 4 May 2006 at 13:47

Dutch Guy said:
I don't have any information but also wanted to know.

My understanding regarding stripe size is that if you use it for Windows a smaller stripe size (<64k) is better but if you also have games and storage a larger one might be better (64k/128k)

Using JDiskReport to give file size distribution for each drive, the file sizes don't seem very different from the C: (Windows) drive and the E:\ (Games) drive on my system.

jbloggs · 4 May 2006 at 21:44

Before I set up RAID0 on 2 x 74GB Raptors 15 months ago, I did quite a bit of reading around, it was suggested that your stripe size should be no more that x 4 your cluster size, ie for a cluster size of 4KB (Windows XP default) your stripe size should be no more than 16KB.

At the moment I am using 32KB stripe size, but have kept a 4KB cluster size, which works very well for me. I tried the large stripes sizes, ie 64KB etc, but found that 32KB best suited my purpose, which is working with large video files.

I ran the Raptors as single HDDs for a while, and where I saw the difference in performance against RAID0 was in the following:

With RAID0, I found file transfers and unzippping large files were quite a bit faster, I also noticed that Windows XP ran a bit more slickly, but no change in booting time, games loaded faster, and programs open slightly faster.

The different benching programs, I think are very "subjective" and not really "objective" at all.

Jimbo Mahoney · 4 May 2006 at 22:10

Looking at my C:\, I'm leaning towards an 8kB stripe.

46% of my files are larger than 16kB, so they will all get striped.

If I set the stripe to 16kB, only ~ 30% of my files would benefit from the stripe (ie those over 32kB).

I'll let you know how it goes.

wesley · 5 May 2006 at 08:50

ChrisLX200 · 5 May 2006 at 08:57

Somehow you need to factor in the frequency at which they're accessed. A lot of files on your C: drive just sit there or are only accessed once at boot, others are accessed more often.

Jimbo Mahoney · 5 May 2006 at 13:58

ChrisLX200 said:
Somehow you need to factor in the frequency at which they're accessed. A lot of files on your C: drive just sit there or are only accessed once at boot, others are accessed more often.

That'll be easy to do then....