Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compression level 1 error gzip stdin invalid compressed data format violated tar Unexpected EOF in archive #11

Closed
omac777 opened this issue Oct 1, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@omac777
Copy link

omac777 commented Oct 1, 2021

I created a tgz using specific compression level and threads with crabz:

git clone https://github.com/sstadick/crabz.git
mv crabz blah
time tar cf - blah/ | crabz --compression-level=1 --compression-threads=6 > ./blah.tgz
mkdir testcrabz
cd testcrabz/
mv ../blah.tgz .
tar zxf ./blah.tgz 

I expected success when extracting a tgz created with crabz.
Instead, I got the following error output:

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

NOTE: This error only surfaces when extracting with tar when crabz uses compression level 1.
tar extracts as expected when crabz uses compression levels 0, 2 to 9 BUT NOT 1.

@omac777 omac777 changed the title gzip stdin invalid compressed data--format violated tar Unexpected EOF in archive gzip stdin invalid compressed data format violated tar Unexpected EOF in archive Oct 1, 2021
@omac777
Copy link
Author

omac777 commented Oct 1, 2021

both the compress and extract are successful when no crabz compression level and compression threads are specified:

tar --use-compress-program="crabz" -cf blah2.tgz blah/
tar xf blah2.tgz

errors surface only at extract time when I specify compression level and compression threads:

tar --use-compress-program="crabz --compression-level=1 --compression-threads=6" -cf blah3.tgz blah/
tar xf blah3.tgz 

gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

@omac777
Copy link
Author

omac777 commented Oct 1, 2021

When compression is turned off with zero value, tar extracts successfully.

tar --use-compress-program="crabz --compression-level=0 --compression-threads=6" -cf blah4.tgz blah/

tar extract also works when compression level is 9:

tar --use-compress-program="crabz --compression-level=9 --compression-threads=6" -cf blah5.tgz blah/

LOL it had to be that the compression level I want to use has a bug in it, but all the others behave as expected:

$ tar xf crabsL1.tgz 
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
$ tar xf crabsL2.tgz 
$ tar xf crabsL3.tgz 
$ tar xf crabsL4.tgz 
$ tar xf crabsL5.tgz 
$ tar xf crabsL6.tgz 
$ tar xf crabsL7.tgz 
$ tar xf crabsL8.tgz 
$ tar xf crabsL9.tgz 

The error only surfaces at compression level 1.

@omac777 omac777 changed the title gzip stdin invalid compressed data format violated tar Unexpected EOF in archive compression level 1 error gzip stdin invalid compressed data format violated tar Unexpected EOF in archive Oct 1, 2021
@sstadick
Copy link
Owner

sstadick commented Oct 1, 2021

That is super weird! I can reproduce locally.

I don't know that I'll have time to sort this one out today, but I'll get it fixed. --compression-level 1 is special and has some different handling than the other compression levels.

@sstadick sstadick added the bug Something isn't working label Oct 1, 2021
@sstadick
Copy link
Owner

sstadick commented Oct 3, 2021

Okay, this goes way deeper than anticipated. I think it's a bug in zlib-ng.

Installing crabz with cargo install -f crabz --no-default-features --features deflate_zlib fixes the issue. Or, to get some speed benefits of not-zlib use the mgzip or bgzf formats, both of which lean on libdeflate but can be decompressed just fine by any gzip tools.

See zlib-ng/zlib-ng#680 and other bugs under the "level 1" search in zlib-ng.

Apparently you found the one compression level + input data to cause an issue! I'm not yet sure how I'll handle this in gzp / crabz. For the short term I'll probably make level 1 compression fail when zlib_ng is in play.

@sstadick
Copy link
Owner

sstadick commented Oct 3, 2021

It's not actually zlib-ngs fault. It looks like when comp level 1 hits truly random data it can balloon up taking up more space when deflated than when inflated. I account for a bit of extra space, but not nearly as much as it needs. I'm working on a fix.

@omac777
Copy link
Author

omac777 commented Oct 4, 2021

I'll give a bit more context how I surfaced this bug.
I was attempting a compress on a large set of files mostly images so they were already compressed with their respective tools so, I really didn't need another extra level of compression, but in case there were a few non-image files in the directories, a bit of compression can't hurt. So that's the angle where I'm coming from when I decided to not use the default compression level 6 and lower it. I also wanted to see how much of a time savings I could get when compared to pigz.
compression level 6, I get a bit of a time savings but not enough to justify replacing pigz.
compression level 2, in my case gave me 20% time savings. Yes replacing pigz with crabz in my case is worthwhile.
compression level 1, in my case I got a lot, 38% time savings but can't extract LOL.

Thanks for making this awesome tool.

@sstadick
Copy link
Owner

sstadick commented Oct 4, 2021

Absolutely! And that's exactly the type of data that would trigger this bug! I should have a new version out in a day or two.

If you really aren't too concerned about the compression ratio and just want some compression, I'd recommend using the mgzip or bgzf format. They should be the fastest of them all and you really don't lose much comp ratio and any decompression tool should be able to decompress it with no problem, with the added benefit that if you use crabz to decompress it's multithreaded.

I appreciate that you took the time to create a helpful issue 👍

@sstadick
Copy link
Owner

sstadick commented Oct 9, 2021

Please see the v0.7.1 release which is building and should be available shortly for the fix.

Sorry about the slow turn-around! I ended up bundling in a few more features I've been wanting to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants