content-length for gzipped data? #52

Pomax · 2015-09-08T20:58:42Z

Is there a way to get the content-length value for data that was run through compression, so that even in chucked/varied transfer mode, the final content length can be send along for browsers that rely on that value to properly terminate the transfer?

dougwilson · 2015-09-08T21:03:05Z

Because the Content-Length is part of the header, in order to provide a Content-Length, we have to buffer the entire compressed body in memory to calculate it. This is unacceptable, unfortunately, because no longer can used send GBs of bodies (or even many MB bodies concurrently) without going OOM in v8.

Your statement is a little weird, though, because chunked transfer encoding and the Content-Length header are mutually exclusive: you can only have one or the other and not both (see https://tools.ietf.org/html/rfc7230#section-3.3.2).

Transferring the body chunked, without a Content-Length, is 100% valid and no browser I am aware of requires the Content-Length header, especially since it would violate the HTTP RFC.

dougwilson · 2015-09-08T21:09:07Z

And of course, on top of that, the Node.js API for writing headers, res.writeHead, is sync, so after that function is returned, the header must be written out to the socket buffer, so we cannot even perform async buffering on the body trying to determine the header to write, as we have to perform this all before even having the body once the underlying code calls res.writeHead :(

Pomax · 2015-09-08T21:13:54Z

it should not come as a surprise that the context here is everything works great, except IE, which seems to be treating our chunks as corrupted data.

dougwilson · 2015-09-08T21:18:04Z

Weird, because I use IE a lot for testing and use this module without issue. Is there a way I can reproduce the issue? Just using the example on the README works just fine for me in IE 10.

Pomax · 2015-09-08T21:19:33Z

debugging atm, hopefully that leads to an MCVE. Or we find the problem along the way prove ourselves wrong in thinking it's connection()... will update once we know either way!

dougwilson · 2015-09-08T22:01:28Z

Ok, keep me updated. I can test as well to help speed thing up (and prevent a never-closing issue). What version of Node.js are you using, version of IE, and version of Windows?

dougwilson · 2015-09-08T22:04:00Z

Also, probably version of this module if it is not 1.5.2.

humphd · 2015-09-09T15:28:08Z

This discussion started because of https://github.com/mozilla/thimble.webmaker.org/issues/1064. We're sending a TAR file (stream) to the browser, and the arraybuffer we get in the browser is fine in Chrome, FF, Opera, and Safari, but corrupted in IE (11 in this case). If I turn off compression, it works.

Here's a minimal example to reproduce the issue:

var express = require("express");
var compression = require("compression");
var tarStream = require("tar-stream");

var app = express();
app.use(compression());

function buildTarStream() {
  var pack = tarStream.pack();
  pack.entry({name: "/index.html"}, "!<DOCTYPE html><html><head><title>Hell World</title></head><body></body></html>");
  pack.entry({name: "/style.css"}, "p { color: red; }");
  pack.finalize();
  return pack;
}

app.get("/", function(req, res) {
  var stream = buildTarStream();

  res.type("application/x-tar");
  stream.pipe(res);
});

app.listen(9000, function() {
  console.log("Server running on port 9000");
});

When I hit this with IE 11, I get a ~1K file compared with the ~3K file I get with compression turned off. It appears to me that IE isn't decompressing the file.

It's also quite possible that I'm just doing something wrong here, and if you notice something that doesn't make sense, I'd appreciate any advice. At the least, doing what I think is the obvious thing doesn't work in IE, and it might point at a bug or doc fix that could help other users.

dougwilson · 2015-09-09T15:35:52Z

Hi @humphd , can you send me a Fiddler trace of the request/response when you hit this in IE11? You can upload it somewhere or simply email me directly with it (my email is in my GitHub profile).

dougwilson · 2015-09-09T15:42:17Z

@humphd , I also need the answers to the following questions: What version of Node.js are you using, version of IE, version of Windows, and version of this module are you using?

humphd · 2015-09-09T15:49:44Z

node v0.12.5
express 4.13.3
compression 1.5.2
IE 11.0.9600.17416
Windows 8.1 (I'm using one of the VMs MS gives for testing IE)

Let me figure out how to run Fiddler and I'll post that next.

dougwilson · 2015-09-09T15:53:18Z

Cool. Looking at your example so far, the actual response stream it spits out, as captured by Whiteshark, is 100% spec-compliant and works fine (I assume this is why all browsers function just fine). My guess here is that the ArrayBuffer you are getting in IE11 contains the compressed tar, rather than the expanded tar, i.e. perhaps there is a bug in IE11 where in certain cases it does not apply the Content-Encoding prior to handing your code the ArrayBuffer?

When you get that ArrayBuffer, what are the first two bytes? 1f 8b or something else?

humphd · 2015-09-09T16:17:56Z

I agree with you, and also suspect the bug is on the IE side. In Chrome on Windows I get 2f 69 as the first 2 bytes, and in IE I'm getting 1f 8b. Even without having any front-end code it's obvious as soon as you hit that route, and the browser downloads the file that it isn't what you'd expect.

dougwilson · 2015-09-09T16:32:20Z

The two bytes 1f 8b are the gzip header, so that means in your ArrayBuffer you do indeed have the gzip'd tar. This means, unfortunately, you have only the following choices:

Perhaps there is something between your server/browser that is stripping the Content-Encoding header, which is why IE11 is not decoding it?
It's an IE11 bug, so there is nothing this module can do. You can take the following actions:

a. In your XMLHTTPRequest, try seeing if you can just not send the Accept-Encoding header for IE11, so then IE11 will just not have compressed responses.
b. On your server-side, either skip compressing tar files, skip compressing for XHR requests, or do some kind of UA detection and skip compressing for IE11.
c. Load ungzip code in your client-side JavaScript and then just ungzip the ArrayBuffer in the worker when you detect the first two bytes are 1f 8b.

I can help you with writing code for any of these, if you decide which you want to implement.

humphd · 2015-09-09T17:13:30Z

@dougwilson, thanks a lot for being so willing to help with this. It's rare to come across a maintainer that is willing to go down a rabbit hole like this.

Based on what we were seeing, I did another experiment, and removing the application/x-tar content type makes this work in all browsers, including IE 11. So clearly IE does the wrong thing when I try to explicitly tell it that this is a tarball, and just let it figure that out on the fly. I also tried using application/octet-stream explicitly, and that also works.

So maybe IE special-cases what it thinks is a compressed stream, and leaves it compressed. I'm not sure if you have docs somewhere that this tidbit could go, but it certainly seems like someone else will hit it at some point.

dougwilson · 2015-09-09T22:53:45Z

Hi @humphd , it is no problem at all! Your information regarding the behavior in relation to the Content-Type header is interesting, and to me, hints that the behavior IE is exhibiting is actually intentional(perhaps some kind of compatibility thing?). I wonder if Edge behaves this way or not ;)

I'm going to look around online to see if I can find some information regarding this, perhaps a reason or at least a deep dive/full description of what the heck is going on :) This would help a lot if we were to add a note to the README for people to understand what is happening (also reduces question here, since we're just pointing to something instead of it looking like we just came up with that information).

humphd · 2015-09-16T22:50:11Z

Talked to MS engineers about this, and was told that they've filed a bug internally to deal with it.

dougwilson · 2015-09-30T01:36:07Z

Interesting, @humphd , that it's actually a bug :) It was almost seeming like they did it on purpose for some unknown reason. So if this is truly a bug, it is pretty much very unlikely that we'll end up documenting it on this module, as otherwise the README ends up being a global collection of bug information. If there is a good site/page that we can point to that is a global collection of bug information, I can definitely add that link to the README :) !

poshest · 2016-03-24T10:29:53Z

I would also like content to be gzipped on the fly by compression and then served non-chunked with an explicit content-length, for the reason given here (in the Edit).

I don't want Cloudfront serving half retrieved files because it couldn't use the Content-Length to verify that it had the whole file. None of the files I want to serve are in the GB range (referring to the reason given by @dougwilson why this can't be achieved).

Would it not be possible to add the functionality and simply warn users in the documentation that this has a size limit?

dougwilson · 2016-03-24T11:30:58Z

Would it not be possible to add the functionality and simply warn users in the documentation that this has a size limit?

No. If you want that functionality, use or make a middleware that will "de-chunkify" your responses. Then you don't have to convince all modules that send chunked responses to add an option not to do that.

dougwilson self-assigned this Sep 8, 2015

dougwilson added the question label Sep 8, 2015

dougwilson closed this as completed Sep 30, 2015

Turbo87 mentioned this issue Dec 13, 2016

integrated debugging Turbo87/intellij-emberjs#109

Open

evilutioner mentioned this issue Aug 25, 2017

OneDrive: Fixed file downloading progress, disabling gzip compression amosavian/FileProvider#61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content-length for gzipped data? #52

content-length for gzipped data? #52

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

dougwilson commented Sep 8, 2015

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

dougwilson commented Sep 8, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 16, 2015

dougwilson commented Sep 30, 2015

poshest commented Mar 24, 2016

dougwilson commented Mar 24, 2016

content-length for gzipped data? #52

content-length for gzipped data? #52

Comments

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

dougwilson commented Sep 8, 2015

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

Pomax commented Sep 8, 2015

dougwilson commented Sep 8, 2015

dougwilson commented Sep 8, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 9, 2015

dougwilson commented Sep 9, 2015

humphd commented Sep 16, 2015

dougwilson commented Sep 30, 2015

poshest commented Mar 24, 2016

dougwilson commented Mar 24, 2016