Bzip2 Large File Recovery

Associate
Joined
1 Aug 2003
Posts
1,053
I had a very large tar file that I compressed (750GB) using bzip2. Bzip2 comes with a recovery utility but only supports files up to 40GB. After changing a few things in the source scripts then recompiling, the recovery utility ran but... it seems to run into a naming convention issue, with the largest file number it will write is rec99999 and I suspect the final block number will be significantly higher.

Does anyone have any experience with running recovery on such large files as I'm getting to the end of my technical abilities in this regard.
 
Associate
Joined
22 Jun 2018
Posts
1,582
Location
Doon the watah ... Scotland
Is there a problem with the original compressed file that means that you are now having to recover it ? ( its not 100% clear in the original )

Edit:

In the source code you edited, was there a variable declared along the lines of ?

Code:
#define BZ_MAX_FILENAME 2000

(that line was taken from an old source code I found from a quick scan. I suspect your code my have 99999 instead of 2000. )

If you changed that to something huge like 9999999, would that get you past your 99999 limit ?
 
Last edited:
Associate
OP
Joined
1 Aug 2003
Posts
1,053
Thanks for responding, I had changed that variable, but that wasn't what was required, it was the line:

Code:
sprintf (split, "rec%5d", wrBlock+1);

Changed the 5 to a 9. But there loads of other problems too, it created so many files that Linux was struggling with everything. I had to write a fairly elaborate sed program to parse all the commands (even xargs was failing).

Yes, the original file was corrupted somehow. I'm still running the bz2 checks so have no idea how many blocks are affected, but it's half way through and has found none. I'll post how I get on incase I run into anything else and so it's googleable should anyone else have this issue.
 
Associate
OP
Joined
1 Aug 2003
Posts
1,053
Right, this has taken a little bit longer as there's a further stumbling block, I had compressed the file using lbzip2 rather than bzip2 to take advantage of multi-threading... which can't be decompressed by bzip2 and the bzip2 checks didn't seem to find anything from with the subblocks.

I'm still left with not all the blocks being retrieved from the original file. I don't get any warning that this is the case or suggestion that it has run into problems, but the resultant recovery blocks add up to 435GB but the original file was 728GB. Am running a few more checks, would still be grateful for advice.
 
Back
Top Bottom