Should we try to handle sites that gzip twice?

Many sites use gzip to compress their code so that it downloads faster. This is a good thing. It does make the web browser work a bit harder (content needs to be put back together after the compression) but shortens the time it takes to show the users a web page).

However, we've sometimes come across sites that are misconfigured – perhaps they have two places where they check if a browser supports gzip compression – and compress the content twice. The effect of that is that when we've decompressed it once and expect to have HTML to show to the user, we still only have a mess of binary data. These sites show up as gibberish – no text, no links, entirely unreadable and unusable in Opera. The most recent bigger site we saw doing this was cars.com, but it appears to have corrected the problems now.

I think this problem is usually caused by server-side browser sniffing. We're now considering changing Opera to try to detect double-gzip scenarios and decompress once more if double gzipping is detected.

With this test I tried to figure out what other browsers do, believing that some of them may have come across the same problem. Surprise – all browsers fail. Or perhaps not really surprising, given that unpacking twice is a somewhat weird thing to do. So does this problem really only affect Opera? And should we respond by doing the slightly weird twice-unpack thing that no other browsers do?

Advertisements

37 thoughts on “Should we try to handle sites that gzip twice?

  1. it's annoying that we're apparently the only ones affected by this in the wild…some common scripts are definitely doing something strange there, and we're at the receiving end.although i would rather see the source of the problem fixed, i could also see this as a built-in error recovery – IF we can reliably detect that content has been double-zipped.

  2. I wouldn't support a quirky workaround. I assume, it is not a large scale problem, but of course I have no idea as to how wide spread this issue really is; as far as I recollect I never came across a site suffering from that problem – or I didn't mind. ;)I think, finding the cause of the double-gzipping should have first priority. Only then it can be determined, if a patch to a few commonly used modules or snippets could cure the ache, or if finally a Opera-side workaround is necessary.On the other hand I'm the kind of Opera user, that does not always blame Opera for each and every display glitch or misfortune. 😉

  3. Never had a problem like that and I believe Opera shouldn't start supporting it, since anyway no other browser does.

  4. *If* Opera's going to do this, I suggest:Show an Opera "Double Gzip Error!" error page that says something like: "Oops! Looks like the server incorrectly gzips this page's content twice when encountering your User Agent of 'Opera/9.80 (Windows NT 5.1; U; en) Presto/2.5.28/2.5.23 Version/10.60!'. Click _here_ to decompress the page one more time to work around the server bug. Also please report this issue to this site's administrator so they can fix it."That text says:1. There's an error.2. Why there's an error.3. The error might only happen when the server encounters Opera (meaning that just because the page works fine in Firefox, it doesn't mean anything).4. You should report it to the site, which further emphasizes that it's a problem with the site and not Opera.5. Opera cares and gives you a way to work around it.(The text could even suggest trying a site preference to identify/mask as a different browser)I don't think Opera should automatically decompress the content a second time. For one, other browsers don't do it. And, doing it automatically hides the problem.Doing it automatically would be more friendly to users that just want the damn site to work. But, that argument is sometimes given too much weight. If it is done automatically, I would expect an entry in the error console.At the end of the day, if the sites are not going to fix the problem, Opera must be agressive and make the site work. So, I support not waiting for sites to fix this problem and working around it.If this is done automatically, it might be best if it was done on a per-site basis via override_downloaded.ini. That way, this double-decompression behavior is not something site authors can rely on, as that would just give them an excuse to not fix their stuff.

  5. the problem is: some rather widely-used packages (older versions of popular forum/cms systems, or wordpress plugins for caching) have these bugs in them. we try as much as we can to work with the authors of these packages to resolve the problem in their code, but even when they cooperate and fix their broken code, there are still tons of installs out there using the original broken system. *we* know it's a problem at the server end, but for many end users all they see is "opera can't handle this site…stupid opera". i'd be in favour of implementing a fix simply as sanity check/error handling (unzip sent content / check if the unzipped content is still gibberish / rinse and repeat). of course, this would go in conjunction with carrying on contacting and working with script authors, but in the short term it helps actual end users get to actual pages and content that they want. there's a time and place for ideology, but not when it's actually harming users, i'd say.as usual, IMHO of course.

  6. Samus_ writes:please don't, whenever a browser becomes more promiscuous the web becomes even messier; today's broken markup, tag soup and all the other horrid abominations that exist are entirely the browser's fault because they accepted them.if a site gzips twice it is an **error** do not disguise it as a feature and let the authors realize their site is broken so they have no other choice but to fix it.

  7. There are a few other reasons that cause sites to compress twice.When apache and php are both configured to compress content, a misconfigured reverse proxy, or someone manually compressed a there js or css files when there server is already compressing it for them.

  8. Anonymous writes:Just make the ungzip code for the browser recursive, checking whether another round of decompression is necessary each time. Of course, this will open up a new abuse angle what with archives that contain themselves, but you can just place an upper limit of number of recursions or on execution time.

  9. Anonymous writes:extract* themselves not "control" themselves. Sorry for the triple post, I'm sure it's annoying.

  10. Anonymous writes:Oh same anonymous as above – first I was going to say that in the case of archives that control themselves you can check whether "decompressing" changes the code, but then you could make an archive that contains itself along with some randomized junk, in which case limiting recurion/execution time would be again the best defense.

  11. Rachid writes:I'd definitely do double or triple unpacking, as long as you can make detection reliable. Would hate to see Opera hang in an unpacking loop for no reason. 😉

  12. A more in-depth analysis of cars.com problem might show one of the reasons Opera sees this more often than other web browsers: we support gzip both as Transfer-Encoding and Content-Encoding, and tell the server that we do. Yngve says it sometimes happens because "Mozilla" is not in the UA string, but per the above it may not always be browser sniffing, also the potential confusion if sites apply both content-encoding and transfer-encoding without actually telling Opera they did so..

  13. I think searching bugzilla for "gzip" or "content encoding" and read open and closed bugs is a good idea. If not for finding the exact solution, but maybe for getting a different approach where the problem could underlie.

  14. Terry writes:It sounds like someone implemented support for HTTP 1.1's Transfer-Encoding in one part of the web server's code, but another part is still applying the Content-Type hack for compressed transfer. Result: it gets compressed twice.Opera is doing exactly what it should do: follow the standard. According to the headers Opera is receiving, it's a gzipped transfer of a payload that is itself something gzipped. If there's anything wrong with what Opera is doing, I'd think it would be that it was trying to render it instead of treating it like a download, just as if it had a .gz extension.Please don't introduce a bug in Opera to counteract a bug elsewhere! :-p

  15. As I look at incoming bugs and other channels for user feedback, I can confirm that this is a frequent problem. Often the site has fixed the problem already before anyone at Opera has a chance to look at it.I have seen this happen on a site in both Firefox and Opera only ones. Unfortunately, this only affects Opera most of the time.:whistle: Core support—done properly—sure wouldn’t hurt the users.

  16. Originally posted by Chas4:

    I still get the garbed code if I reload the page after first loading it on cars.com

    Confirmed. Which also kind of confirms that this has something to do with UA sniffing and gzip support being detected twice.

  17. Absolutly no. It is silent aggreement to break standards. It will also break many other things. For example, what if somebody have .gz file on disk and it is again compressed. It will be decompressed incorrectly twice.Having such heuristic detection will also mean that web developer after checking, in one browser that everything works will assume there is no error in compression path.So no.

  18. About showing error. I would be happy if opera will have small icon, and small per-page log with descriptive messages of error. Not error console, but small notifications, like: "Warning: This server is buggy. It uses improper Encoding-Type. please contact server administrator. Assuming x.y.z.". The same for all other HTTP protocol parsing stuff, and maybe other things , like: "Warning: This site have stylesheets which contains major syntax errors and 30 unknown attributes. Please contact server administrator. Trying hard to render page anyway.". This will not only show to the users and developers that problem is not in Opera, but will also allow webdevelopers to spot accute problems in server configurations.Then it can be shows as "Warning: This server sent compressed content as advised. After decompressing it still looks to be compressed using gzip and do not look as text/html. Contact site administrator."This notification system could be disabled by default (will only show small icon on major problems), but could be enabled and will show small log in status bar (or just under the address bar per-tab). It would then possible to report there some statistics like: "Warning: This site after loading triggered 23 CSS warnings, 4 JS errors, 5 HTML warning and 2 HTML errors. First one: Unknown property: -webkit-animation. Open Error Console for more informations."Will be really usefull for developers. Will not be obstructive as current console.

  19. Originally posted by hallvors:


    Hallvord R. M. Steen # 9. June 2010, 12:22

    A more in-depth analysis of cars.com problem might show one of the reasons Opera sees this more often than other web browsers: we support gzip both as Transfer-Encoding and Content-Encoding, and tell the server that we do. Yngve says it sometimes happens because "Mozilla" is not in the UA string, but per the above it may not always be browser sniffing,

    I did the cars.com analysis, and contacted them a few times with the results. The issue with doing nothing is that Opera doesn't have the market share to force change on the web site implementers. Further, you also run into situations that I did while reporting it. First they told me to use a different browser, and when I illustrated the exact error and emphasized that it was browser independent, they didn't have the know-how to fix it. The developers code to a test web server, and that one didn't show the issue. Their production server apparently only exhibited it with external clients. Perhaps there were two components involved, and the downstream one didn't check to see if the stream was already compressed and blindly did it again. I don't know. But it seemed the developers were disconnected from those that managed deployment on the production server, and it was clear that it just wasn't going to get fixed. You'd like to think that people would jump at a clear defect like that, but most of them are not web purists.

    As a compromise, if people don't want Opera to handle this, let me modify the AE and TE headers that Opera sends out. I'll remove "gzip" from one of them and the issue will go away.

  20. Anonymous writes:Please do (better even if it can be user-configured on a per site basis), there are a few sites I really want to follow and at the moment I have to use other browsers 😦

  21. Originally posted by anonymous:

    Anonymous writes:Please do (better even if it can be user-configured on a per site basis), there are a few sites I really want to follow and at the moment I have to use other browsers 😦

    You could use some proxy server for this. Privoxy for example. It also handles other things nicly (like filtering content, pipelineing, cookie filtering, etc).I still vote for showing notice bar for any similar problems with buggy servers. This will make Opera still standard compiland, will show users that a problem is not in the Opera, and show web developers problems easier to detect, even without special tools or using 10 different browsers.

  22. Anonymous writes:For many years I used the proxomitron, but since Opera introduced the integrated ad block and the per-site preferences, I ditched it. I'd prefer not to have to go back again.

  23. A fix for cars.com is in the pipeline – we will stop telling web servers that we support Transfer-Encoding. (Firefox keeps T-E support secret too, so we'll basically align with the headers they send). This might help some of the other sites with this problem too. If the problem remains annoying after this fix I guess it's time to investigate a content-sniffing double-ungzipping magic fix, and if we go there we should certainly make sure we do an error console warning to that effect.

  24. Originally posted by FataL:

    BTW, I haven't had issues with cars.com lately. Or I was just lucky?

    I just saw the issues when I tried to search for fun (I'm not from the U.S.) for a car.

  25. popcrunch is something different.It is only compressed once. You can see that with "save as" and then gunzip from the command line. I see from the headers below that it shows deflate as the encoding.www.cars.com shows gzip as the encoding: … Vary: Accept-Encoding, User-Agent Content-Encoding: gzip Content-Length: 20031 …Perhaps deflate is incorrect? If the payload has the gzip header block, perhaps it must properly report it as gzip encoded, not deflate?— showing that the output is gzipped —(Note that this output is identical to what I get with "save as" from inside opera.)$ ./wget –header="Accept-Encoding: gzip" –header="TE: gzip" -O popcrunch.reply http://www.popcrunch.com/–2010-07-06 10:23:19– http://www.popcrunch.com/Resolving http://www.popcrunch.com... 67.222.101.10Connecting to http://www.popcrunch.com|67.222.101.10|:80… connected.HTTP request sent, awaiting response… 200 OKLength: 8942 (8.7K) [text/html]Saving to: `popcrunch.reply'100%[==========================================================================================>] 8,942 42.5K/s in 0.2s2010-07-06 10:23:20 (42.5 KB/s) – `popcrunch.reply' saved [8942/8942]$ file popcrunch.replypopcrunch.reply: gzip compressed data, from Unix$ gunzip -c < popcrunch.reply > popcrunch.reply.gunzipped$ file popcrunch.reply.gunzippedpopcrunch.reply.gunzipped: HTML document text— popcrunch reply headers follow —HTTP/1.1 200 OKDate: Tue, 06 Jul 2010 14:13:19 GMTServer: ApacheLast-Modified: Tue, 06 Jul 2010 14:12:35 GMTETag: "57a802e-22e5-48ab8a513dac0"Accept-Ranges: bytesCache-Control: max-age=3555, public, must-revalidate, proxy-revalidateExpires: Tue, 06 Jul 2010 15:12:35 GMTX-Pingback: http://www.popcrunch.com/xmlrpc.phpX-Powered-By: W3 Total Cache/0.8.5.2Vary: Accept-Encoding,CookiePragma: publicContent-Type: text/html; charset=UTF-8Content-Encoding: deflateContent-Length: 8933

  26. Originally posted by ouzoWTF:

    I just saw the issues when I tried to search for fun (I'm not from the U.S.) for a car.

    Issues with cars.com are sporadic. I think it only happens on their production servers, since one reply suggested that the developer couldn't see it on their dev server. I think that their production deployment adds some component that tries to optimize, and is compressing again. (using CPU to slightly increase the byte count, since it is already compressed. 🙂

  27. Curiously, IE 8, Firefox 3.6.8, and Chrome 6.0 load this site correctly… http://imaginationstationtoledo.org/So IMO, whatever these other browsers are doing with gzip data sent by "Content-Encoding: deflate", then Opera should probably be doing as well.BTW, Remco directed me to your post, and supplied me with the knowledge of gzip vs. deflate obtained above. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s