Hey guys. I'm quite happy using rsync to backup files, but at present I'm trying to do something slightly different. I've developed a phobia of files corrupting and hope to solve this with a backup. I'd rather write my own script for this (I do not know how to yet, but I will learn), so this is more to check that my reasoning is sound before I start trying to write it.
The idea is to have two identical directories on different partitions, maintaining exactly the same files in the same directory tree. In English, what I'm intending the script to do is:
-------------------------------------------------------------------------------------------------------
Check the md5sum of a file against a previously generated list
If they match, move onto next file.
If not, append the name of the file to a list of potentially corrupted files
Do the same in the second directory
Compare the two lists of potentially corrupt files,
If the same file name appears in both, but with different checksums, move the files to a folder termed ''corrupt" within their own respective partitions
If a file appears in one but not the other, move the corrupt file to a folder termed "replaced" and copy the good one across.
Print to a file how many of each exchange occurred each run, and the names of those that were moved.
-------------------------------------------------------------------------------------------------------
This relies upon initially generating a checksum file which then never itself becomes corrupt. Some form of self checking mechanism for this would be wise, at the least check that the main md5sum file in each directory match each other and abort if they do not.
Will require running a small script to move new files into the directory, to the effect of calculating their checksum and appending it to the appropriate file before/after copying them across.
Main concerns are
1/That the above is inherently unworkable
2/That it will lead to destroying data, especially that which is copied into the directory after the initial index file is generated
3/The processor and disk overhead will be excessive
4/That bash will do this poorly and I would be wiser to write it in c.
Any feedback welcome.
Cheers
The idea is to have two identical directories on different partitions, maintaining exactly the same files in the same directory tree. In English, what I'm intending the script to do is:
-------------------------------------------------------------------------------------------------------
Check the md5sum of a file against a previously generated list
If they match, move onto next file.
If not, append the name of the file to a list of potentially corrupted files
Do the same in the second directory
Compare the two lists of potentially corrupt files,
If the same file name appears in both, but with different checksums, move the files to a folder termed ''corrupt" within their own respective partitions
If a file appears in one but not the other, move the corrupt file to a folder termed "replaced" and copy the good one across.
Print to a file how many of each exchange occurred each run, and the names of those that were moved.
-------------------------------------------------------------------------------------------------------
This relies upon initially generating a checksum file which then never itself becomes corrupt. Some form of self checking mechanism for this would be wise, at the least check that the main md5sum file in each directory match each other and abort if they do not.
Will require running a small script to move new files into the directory, to the effect of calculating their checksum and appending it to the appropriate file before/after copying them across.
Main concerns are
1/That the above is inherently unworkable
2/That it will lead to destroying data, especially that which is copied into the directory after the initial index file is generated
3/The processor and disk overhead will be excessive
4/That bash will do this poorly and I would be wiser to write it in c.
Any feedback welcome.
Cheers
Last edited: