Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib: Fix gzip member header/input buffer boundary issue #5883

Closed

Conversation

addaleax
Copy link
Member

Pull Request check-list

  • Does make -j8 test (UNIX) or vcbuild test nosign (Windows) pass with
    this change (including linting)?
  • Is the commit message formatted according to CONTRIBUTING.md?
  • If this change fixes a bug (or a performance problem), is a regression
    test (or a benchmark) included?

Affected core subsystem(s)

zlib

Description of change

Make sure that, even if an inflate() call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output.

This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by gzip(1), too.)

The problem resolved by this is mentioned here, but definitely not as likely to occur in the wild, since it does not affect the decompression of gzip files containing only a single member.

/cc @kthelgason

@addaleax addaleax changed the title Fix gzip member header/input buffer boundary issue zlib: Fix gzip member header/input buffer boundary issue Mar 24, 2016
@jasnell jasnell added the zlib Issues and PRs related to the zlib subsystem. label Mar 24, 2016
@bnoordhuis
Copy link
Member

Mostly LGTM but I think this is a semver-major change? Can you remove the (now unused) GZIP_ defines from src/node_zlib.cc?

@addaleax
Copy link
Member Author

@bnoordhuis GZIP_MIN_HEADER_SIZE can still be removed, though, you’re right.

And regarding semver: Not sure – If so, then #5120 was semver-major in the first place. The current (both master and 5.9.x) behaviour is to throw an error upon encountering trailing garbage only if that starts with 1f 8b, which can be pretty unreliable I imagine (especially if those garbage bytes are actually random).

So, yes, possibly reverting back to the pre-5.9 behaviour of silently discarding trailing garbage in all cases is a good idea for 5.x, and implementing that should certainly be feasible. But I completely agree with your comment here, throwing an error should be the right behaviour.

@addaleax addaleax force-pushed the fix-gzip-member-buffer-boundary branch from 32c7d28 to c8755e7 Compare March 24, 2016 09:41
@bnoordhuis
Copy link
Member

Gray area, I suppose. Maybe one or two people from @nodejs/ctc can weigh in?

Change itself LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/2052/

@addaleax
Copy link
Member Author

For future reference and to the best of my knowledge:

.gz file content v5.8 v5.9 / master after this gunzip(1)
2 members first member both members both members both members
1 member + garbage member member throws member³
1 member + 00 + garbage¹ member member member member³
1 member + 1f 8b + garbage² member throws throws member³

¹ never actually inspected by any Node.js version, assumed to be null byte padding
² at least 10 bytes
³ plus a warning on stderr and non-zero exit code

And for the record, I don’t think trailing garbage is much of a real-world problem, but I’m not sure how relevant that fact is.

@@ -43,7 +43,6 @@ enum node_zlib_mode {

#define GZIP_HEADER_ID1 0x1f
#define GZIP_HEADER_ID2 0x8b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defines are unused now right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but #5884 would start using them again, so I’d leave them in unless it gets decided that that should not get landed.

@kthelgason
Copy link
Contributor

It's hard to debate the relevance of trailing garbage without any idea of prevalence in the wild. I'm on the fence regarding the behaviour of the 1 member + garbage case, I can see the argument for both sides.

Regarding semver-major, #5120 pretty much only changed in a breaking way the behaviour of the last case in your table, and that was unintended, so it looked like a minor version bump to me at least. This PR is now being explicit about throwing an error where previously there was none, so I think a better case can be made for this landing in a major version.

EDIT: I was thinking of the case here #5863. @addaleax you're absolutely right regarding semver-major on this change.

@addaleax
Copy link
Member Author

Yep, I can definitely see why one would count this as semver-major, and personally I don’t think I have a strong opionion on that. But keep in mind that this PR still mostly contains a bugfix: For the case that a new member’s header wraps around the input buffer boundary. So even if this does not land in some 5.x version, I’d still argue that a modified patch (which keeps the current trailing-garbage behaviour or restores the old one) should.

@kthelgason
Copy link
Contributor

Yes, I agree 100%.

There is still the issue of the boundary between members falling exactly on the input buffer boundary, so we haven't really gotten rid of the problem, only reduced the likelihood of it's occurrence (significantly).

EDIT: turns out this is taken care of.

@@ -38,3 +41,26 @@ fs.createReadStream(pmmFileGz)
assert.deepStrictEqual(Buffer.concat(pmmResultBuffers), pmmExpected,
'result should match original random garbage');
}));

// test that the next gzip member can wrap around the input buffer boundary
[0, 1, 2, 3, 4, defEncoded.length].forEach((offset) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kthelgason I think testing for 0 should take care of the case where data members fall exactly on input boundaries:

The first unzip.write() call writes abcEncoded + 0 bytes from defEncoded, and the second unzip.end() call writes all of defEncoded. Or did you have something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, you're absolutely right.

@jasnell
Copy link
Member

jasnell commented Mar 24, 2016

I agree that it's technically a semver-major but this definitely looks like a grey area and I agree with the comments that an error is the right thing to do. I think I'd be ok with this landing as a semver-minor.

@jasnell
Copy link
Member

jasnell commented Mar 24, 2016

The change LGTM

@addaleax
Copy link
Member Author

Btw, this is what re-allowing trailing garbage would look like – wouldn’t be a big deal.

Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

This change also modifies behaviour for trailing garbage:
If there is trailing garbage which happens to start with the
gzip magic bytes, it is no longer discarded but rather throws
an error, since we cannot reliably tell random garbage from
a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed
out at various occasions that such padding is normal and
discarded by `gzip(1)`, too.)

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).
@addaleax addaleax force-pushed the fix-gzip-member-buffer-boundary branch from c8755e7 to e6af8b9 Compare March 29, 2016 15:49
@addaleax
Copy link
Member Author

Rebased & squashed this into one commit.

@bnoordhuis
Copy link
Member

Thanks Anna, landed in 54a5287.

@bnoordhuis bnoordhuis closed this Mar 31, 2016
bnoordhuis pushed a commit that referenced this pull request Mar 31, 2016
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

This change also modifies behaviour for trailing garbage:
If there is trailing garbage which happens to start with the
gzip magic bytes, it is no longer discarded but rather throws
an error, since we cannot reliably tell random garbage from
a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed
out at various occasions that such padding is normal and
discarded by `gzip(1)`, too.)

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

PR-URL: #5883
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
@bnoordhuis bnoordhuis added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 31, 2016
@addaleax
Copy link
Member Author

@bnoordhuis Thanks! I would actually be more comfortable with having this marked as semver-major and backporting this bugfix in a way that keeps the current (v5.x) behaviour for trailing garbage. Would that be okay? (And if so, would creating a PR against the v5.x branch be the correct way to do that?)

@addaleax addaleax deleted the fix-gzip-member-buffer-boundary branch March 31, 2016 11:33
@bnoordhuis bnoordhuis added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Mar 31, 2016
@bnoordhuis
Copy link
Member

Okay, I updated the labels. PRing against v5.x sounds good; v5.x-staging is the branch you want.

@addaleax
Copy link
Member Author

Sorry to ask, but there is no v5.x-staging branch here?

@bnoordhuis
Copy link
Member

Oh, that's right; I remember being baffled by that a while ago. It should be fine to target v5.x in that case.

@jasnell @thealphanerd What's the reason there isn't a v5.x-staging branch like there is for v0.10/v0.12 and v4?

@MylesBorins
Copy link
Contributor

@bnoordhuis afaik the staging convention has been primarily used for LTS... but tbqh it would be useful for v5 as well and likely allow for quite a bit less pain when cutting releases.

Not 100% where an issue should be made for this though, ctc?

@bnoordhuis
Copy link
Member

I don't think an issue is necessary. If other @nodejs/release people feel it's a good idea, just go ahead and do it. Maybe file an informational issue afterwards as a memo for contributors.

addaleax added a commit to addaleax/node that referenced this pull request Apr 2, 2016
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

This is a backport of nodejs#5883 and contains additional changes to
make sure that the behaviour on encountering trailing garbage
remains the same (namely to silently discard it if one full member
has already been decompressed).

 #PR-URL: nodejs#5883
 #Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
 #Reviewed-By: James M Snell <jasnell@gmail.com>
jasnell pushed a commit that referenced this pull request Apr 15, 2016
Make sure that, even if an `inflate()` call only sees the first
few bytes of a following gzip member, all members are decompressed
and part of the full output.

Adds tests for the special case that the first `inflate()` call
receives only the first few bytes of a second gzip member but
not the whole header (or even just the magic bytes).

This is a backport of #5883 and contains additional changes to
make sure that the behaviour on encountering trailing garbage
remains the same (namely to silently discard it if one full member
has already been decompressed).

PR-URL: #5973
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-major PRs that contain breaking changes and should be released in the next major version. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants