Which file system for generic data storage?

Soldato
Joined
22 Dec 2008
Posts
10,369
Location
England
I have a suspicion that the answer is trivial. I have about a terabyte of data to store, mainly backups. A lot of it I will probably never access again, but better safe than sorry. Speed is irrelevant, but I don't want it to corrupt over time.

The three I'm considering are ext2, ext3, fat32. Which one is most suitable?

I believe ext3 is considered more resilient, though I don't really see why a journal (which I think delays writes in order to structure them better) makes it so. Fat32 is under consideration on the basis that it's ancient and therefore probably reliable, but I clearly don't know what I'm talking about here.

Advise would be much appreciated. I was using ext2 but have encountered some corrupt files, whether this is the fault of the file system, the copy of ubuntu I'm running or a wayward overclock I don't know. It puts me off ext2 enough to make this thread.

Cheers
 
none. ZFS would do it.
discount fat 32 as it won't do stuff more than 4gb. unless your archives have files less than that.

ext2 and ext3 are the same except ext3 has a journal which if yourdata is static won't realy mater as far as I can see.

What you want then, is to store your data, generate md5 or similar hashes of the data, perhaps some kind of checksum with error bits, (RAR + PAR? ) and then periodically check them all against a master index of the values. If this change you have bit rot and you must repair the files using the checksummed data. Or just burn it onto some DVD's if the data truly is static. A TB of data isn't that much to burn to DVD. The most data I have burned to DVD has been about 100 disks of 400GB so that's about 40TB. Took a while right enough though.
 
Last edited:
Nothing wrong with EXT3... FAT32 uses no security, so it's faster but obviously less secure.

At above, where does this myth come from? One day i hear it doesn't work on devices over 30GB, now it's 4GB. Well for your information - FAT32 works on any sized media. It's just that for some reason Windows is incapable of formatting to it on devices over 30GB. Thankfully this is the Linux forum.
 
ZFS sounds ideal. Right up to the point where I realise it's not really available under linux yet.
DVDs however sound like a nightmare, I'll never be organised enough to manage that. Checksums with a means of repairing files which fail the periodic test is the way to go. I've had a look with google and discovered I don't know what to search for (rar/par relies on breaking the files up into pieces which I'd rather avoid), is there a more refined approach than maintaining two copies of everything along with an index of the checksum of every item?

Not a myth Super, fat32 cannot cope with individual files greater in size than 4gb. It refuses to write them, point dd if=/dev/zero at a fat32 filesystem and it'll stop writing at 4gb. Regarding windows and the 130gb partition limit this is quite good.

Thanks guys
 
If I lost the entirety of it I'd be annoyed, but it's less important than whatever I'm working on at the time which tends to be obsessively backed up in multiple places. Maintaining the data on two drives and periodically comparing them is probably the best solution I know how to implement, but I'd prefer a solution that uses 20% to 50% more space instead.
 
get a 2 TB drive

put all your files on twice.

perform a one time check of each file and generates a hash
store this generated hash with the checksumming routine (to be able to error correct it in future)

write a script that compares each copy with the other one if there is a difference, compare each one against the hash and overwrite the corrupt one with the correct one.

Run this script once a month or whatever you think is best.
 
If that was the plan then you may as well just get two 1TB drives and RAID them (whatever you like... 5 maybe?) and save yourself a considerable chunk of money.
 
but that won't do the same thing.

Firstly if you raid them then you have a 'live' system. As opposed to a static collection of files offline. In order to keep the integrity of the data you need to maintain a raid setup. Raid isn't a back up.

Secondly it won't prevent against 'bit rot'.

It's easy enough to just make copies of files. Maintaining the integrity and the knowledge that the files are safe takes more than than just copying the files.
 
With a slight change I think that's the approach I will take, thank you whitecrook.

I'll go for two drives, each with the same data and each with an index of checksum. Cheaper than a 2tb drive, and protects against drive failure. As covered this suits me better than raid 1. It's also within my limited scripting ability, so all is well.

Cheers
 
ZFS on OpenSolaris or even ordinary Solaris would work for this. Solaris is a great OS and if you're only having it as a fileserver you should be fine. :)

ZFS ensures the file integrity of a file whenever it is read or written. You also have all of the cool features like the ability to have cache devices, and the new one (which I've not played with yet) even supports block level de-duplication. ZFS snapshots are also fantastic as backups. Performing a regular scrub on a zfs storage pool should also prevent against the aforementioned "bit rot" (not that I've seen it, but a weekly scrub is a good idea in any case).
 
Back
Top Bottom