Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content-length for gzipped data? #52

Closed
Pomax opened this issue Sep 8, 2015 · 20 comments
Closed

content-length for gzipped data? #52

Pomax opened this issue Sep 8, 2015 · 20 comments
Assignees
Labels

Comments

@Pomax
Copy link

Pomax commented Sep 8, 2015

Is there a way to get the content-length value for data that was run through compression, so that even in chucked/varied transfer mode, the final content length can be send along for browsers that rely on that value to properly terminate the transfer?

@dougwilson
Copy link
Contributor

Because the Content-Length is part of the header, in order to provide a Content-Length, we have to buffer the entire compressed body in memory to calculate it. This is unacceptable, unfortunately, because no longer can used send GBs of bodies (or even many MB bodies concurrently) without going OOM in v8.

Your statement is a little weird, though, because chunked transfer encoding and the Content-Length header are mutually exclusive: you can only have one or the other and not both (see https://tools.ietf.org/html/rfc7230#section-3.3.2).

Transferring the body chunked, without a Content-Length, is 100% valid and no browser I am aware of requires the Content-Length header, especially since it would violate the HTTP RFC.

@dougwilson dougwilson self-assigned this Sep 8, 2015
@dougwilson
Copy link
Contributor

And of course, on top of that, the Node.js API for writing headers, res.writeHead, is sync, so after that function is returned, the header must be written out to the socket buffer, so we cannot even perform async buffering on the body trying to determine the header to write, as we have to perform this all before even having the body once the underlying code calls res.writeHead :(

@Pomax
Copy link
Author

Pomax commented Sep 8, 2015

it should not come as a surprise that the context here is everything works great, except IE, which seems to be treating our chunks as corrupted data.

@dougwilson
Copy link
Contributor

Weird, because I use IE a lot for testing and use this module without issue. Is there a way I can reproduce the issue? Just using the example on the README works just fine for me in IE 10.

@Pomax
Copy link
Author

Pomax commented Sep 8, 2015

debugging atm, hopefully that leads to an MCVE. Or we find the problem along the way prove ourselves wrong in thinking it's connection()... will update once we know either way!

@dougwilson
Copy link
Contributor

Ok, keep me updated. I can test as well to help speed thing up (and prevent a never-closing issue). What version of Node.js are you using, version of IE, and version of Windows?

@dougwilson
Copy link
Contributor

Also, probably version of this module if it is not 1.5.2.

@humphd
Copy link

humphd commented Sep 9, 2015

This discussion started because of https://github.com/mozilla/thimble.webmaker.org/issues/1064. We're sending a TAR file (stream) to the browser, and the arraybuffer we get in the browser is fine in Chrome, FF, Opera, and Safari, but corrupted in IE (11 in this case). If I turn off compression, it works.

Here's a minimal example to reproduce the issue:

var express = require("express");
var compression = require("compression");
var tarStream = require("tar-stream");

var app = express();
app.use(compression());

function buildTarStream() {
  var pack = tarStream.pack();
  pack.entry({name: "/index.html"}, "!<DOCTYPE html><html><head><title>Hell World</title></head><body></body></html>");
  pack.entry({name: "/style.css"}, "p { color: red; }");
  pack.finalize();
  return pack;
}

app.get("/", function(req, res) {
  var stream = buildTarStream();

  res.type("application/x-tar");
  stream.pipe(res);
});

app.listen(9000, function() {
  console.log("Server running on port 9000");
});

When I hit this with IE 11, I get a ~1K file compared with the ~3K file I get with compression turned off. It appears to me that IE isn't decompressing the file.

It's also quite possible that I'm just doing something wrong here, and if you notice something that doesn't make sense, I'd appreciate any advice. At the least, doing what I think is the obvious thing doesn't work in IE, and it might point at a bug or doc fix that could help other users.

@dougwilson
Copy link
Contributor

Hi @humphd , can you send me a Fiddler trace of the request/response when you hit this in IE11? You can upload it somewhere or simply email me directly with it (my email is in my GitHub profile).

@dougwilson
Copy link
Contributor

@humphd , I also need the answers to the following questions: What version of Node.js are you using, version of IE, version of Windows, and version of this module are you using?

@humphd
Copy link

humphd commented Sep 9, 2015

  • node v0.12.5
  • express 4.13.3
  • compression 1.5.2
  • IE 11.0.9600.17416
  • Windows 8.1 (I'm using one of the VMs MS gives for testing IE)

Let me figure out how to run Fiddler and I'll post that next.

@dougwilson
Copy link
Contributor

Cool. Looking at your example so far, the actual response stream it spits out, as captured by Whiteshark, is 100% spec-compliant and works fine (I assume this is why all browsers function just fine). My guess here is that the ArrayBuffer you are getting in IE11 contains the compressed tar, rather than the expanded tar, i.e. perhaps there is a bug in IE11 where in certain cases it does not apply the Content-Encoding prior to handing your code the ArrayBuffer?

When you get that ArrayBuffer, what are the first two bytes? 1f 8b or something else?

@humphd
Copy link

humphd commented Sep 9, 2015

I agree with you, and also suspect the bug is on the IE side. In Chrome on Windows I get 2f 69 as the first 2 bytes, and in IE I'm getting 1f 8b. Even without having any front-end code it's obvious as soon as you hit that route, and the browser downloads the file that it isn't what you'd expect.

@dougwilson
Copy link
Contributor

The two bytes 1f 8b are the gzip header, so that means in your ArrayBuffer you do indeed have the gzip'd tar. This means, unfortunately, you have only the following choices:

  1. Perhaps there is something between your server/browser that is stripping the Content-Encoding header, which is why IE11 is not decoding it?
  2. It's an IE11 bug, so there is nothing this module can do. You can take the following actions:

a. In your XMLHTTPRequest, try seeing if you can just not send the Accept-Encoding header for IE11, so then IE11 will just not have compressed responses.
b. On your server-side, either skip compressing tar files, skip compressing for XHR requests, or do some kind of UA detection and skip compressing for IE11.
c. Load ungzip code in your client-side JavaScript and then just ungzip the ArrayBuffer in the worker when you detect the first two bytes are 1f 8b.

I can help you with writing code for any of these, if you decide which you want to implement.

@humphd
Copy link

humphd commented Sep 9, 2015

@dougwilson, thanks a lot for being so willing to help with this. It's rare to come across a maintainer that is willing to go down a rabbit hole like this.

Based on what we were seeing, I did another experiment, and removing the application/x-tar content type makes this work in all browsers, including IE 11. So clearly IE does the wrong thing when I try to explicitly tell it that this is a tarball, and just let it figure that out on the fly. I also tried using application/octet-stream explicitly, and that also works.

So maybe IE special-cases what it thinks is a compressed stream, and leaves it compressed. I'm not sure if you have docs somewhere that this tidbit could go, but it certainly seems like someone else will hit it at some point.

@dougwilson
Copy link
Contributor

Hi @humphd , it is no problem at all! Your information regarding the behavior in relation to the Content-Type header is interesting, and to me, hints that the behavior IE is exhibiting is actually intentional(perhaps some kind of compatibility thing?). I wonder if Edge behaves this way or not ;)

I'm going to look around online to see if I can find some information regarding this, perhaps a reason or at least a deep dive/full description of what the heck is going on :) This would help a lot if we were to add a note to the README for people to understand what is happening (also reduces question here, since we're just pointing to something instead of it looking like we just came up with that information).

@humphd
Copy link

humphd commented Sep 16, 2015

Talked to MS engineers about this, and was told that they've filed a bug internally to deal with it.

@dougwilson
Copy link
Contributor

Interesting, @humphd , that it's actually a bug :) It was almost seeming like they did it on purpose for some unknown reason. So if this is truly a bug, it is pretty much very unlikely that we'll end up documenting it on this module, as otherwise the README ends up being a global collection of bug information. If there is a good site/page that we can point to that is a global collection of bug information, I can definitely add that link to the README :) !

@poshest
Copy link

poshest commented Mar 24, 2016

I would also like content to be gzipped on the fly by compression and then served non-chunked with an explicit content-length, for the reason given here (in the Edit).

I don't want Cloudfront serving half retrieved files because it couldn't use the Content-Length to verify that it had the whole file. None of the files I want to serve are in the GB range (referring to the reason given by @dougwilson why this can't be achieved).

Would it not be possible to add the functionality and simply warn users in the documentation that this has a size limit?

@dougwilson
Copy link
Contributor

Would it not be possible to add the functionality and simply warn users in the documentation that this has a size limit?

No. If you want that functionality, use or make a middleware that will "de-chunkify" your responses. Then you don't have to convince all modules that send chunked responses to add an option not to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants