Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall back on gzip if pigz fails #476

Closed
rbs-jacob opened this issue Jun 14, 2024 · 4 comments
Closed

Fall back on gzip if pigz fails #476

rbs-jacob opened this issue Jun 14, 2024 · 4 comments
Labels
good first issue Good for newcomers

Comments

@rbs-jacob
Copy link
Member

Recent PR #472 uses Python's built-in gzip module for decompressing GZIP data if possible, and falls back on the system version of pigz if the Python version fails.

Inside of OFRAK Docker images, we can be confident that pigz is present. But when a user installs OFRAK via pip install, they may not have pigz on their system. They are, however, likely to have gzip on their system. As such, we should fall back from Python gzip to pigz to gzip to increase the likelihood that something will work for GZIP decompression.

Implementing this may involve modifying ComponentExternalTool (see this comment on #472). This may have implications for other OFRAK functionality using ComponentExternalTool.

@rbs-jacob rbs-jacob added the good first issue Good for newcomers label Jun 14, 2024
@alchzh
Copy link
Contributor

alchzh commented Jul 16, 2024

If we're only using pigz as a fallback for the trailing data case and prefer python standard library gzip otherwise, I think it makes more sense to implement a fix in python ourselves, it's a pretty simple modification to the standard library package. (see this stackoverflow answer: https://stackoverflow.com/questions/4928560/how-can-i-work-with-gzip-files-which-contain-extra-data)

there isn't any sort of official Windows packaging for pigz

@whyitfor
Copy link
Contributor

Like the idea of removing dependencies if we can. If your solution passes all the tests (including https://github.com/redballoonsecurity/ofrak/blob/master/ofrak_core/test_ofrak/components/test_gzip_component.py#L38, then it should work).

Would also make sense to update our testing code to use python std. library to set up/validate zip files for testing.

@rbs-jacob
Copy link
Member Author

One issue not taken into account in #472, when we switched the default GZIP unpacker to the Python standard library instead of pigz, is that pigz is almost certainly faster for large files where the constant cost of starting a process is negligible compared to the processing time.

If, in fact, we can reliably use Python standard library gzip to correctly handle trailing garbage data as that StackOverflow post implies, then probably the logic should be something like:

  • If file over size threshold and pigz is installed:
    • Inflate with pigz
  • Else
    • Inflate with Python standard library gzip

@alchzh
Copy link
Contributor

alchzh commented Sep 3, 2024

Can be closed after #485 merged. Should have written "Fixes" in that PR 🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants