New adventures in compatibility testing

I'm having some fun trying to figure out how sites use document.getElementsByName(), and thought some of you might be interested in the testing approach.

The bug I'm investigating is a small and ugly one hiding in the document.getElementsByName() implementation – getElementsByName('someID') will find an element with id="someID".

This is of course bad behaviour. That method has nothing to do with IDs and should find elements by name only.

The good news is that it's trivial to fix. The bad news is that it's there for a reason, and the reason is called Internet Explorer. We've been bug-compatible on purpose and while we'd like to remove the bug we have no idea how many sites will break if we do!

So, I'd like an answer to questions like these:

  • How many sites use getElementsByName() to find elements with an ID?
  • Do these sites break if we fix the bug?
  • Do they have alternate code paths for browsers doing it right? If yes, how do they figure out what code to use?

Tools at our disposal: the MAMA web code search engine (an internal Opera project), User JavaScript, and two ad-hoc Opera Unite services.

MAMA tracks sites that might be using document.getElementsByName(). It knows about roughly 45 000 sites where it has seen the string "getElementsByName" in script source code, and it generously provides 5000 random ones in a text file on my request. Naturally, MAMA does only static analysis of the scripts, it can't tell whether the method is actually called or what it was used for.

That information is a piece of cake to get with User JavaScript. A trivial custom script, trackGEBNabuse.js, overwrites the getElementsByName() method with one that will do a bit of debugging and logging on our behalf. And I'm playing with Opera Unite for the first time, with one logging service and one URL player that keeps track of which of the 5000 URLs were already visited and sends Opera to the next one.

(Opera Unite actually rocks! It's fun to write backend-type logic in JavaScript rather than PHP, and it's less hassle while developing to keep all the information, URL lists, log files and scripts locally on the hard drive. I've been undecided about Unite, not sure if it was more important than all the other things we should be spending time on – now I see it's maturing and making itself useful. Nice.)

To walk you through the main logic of things – here's the user JS that overwrites the native method to do logging – commented:

(function(gebn){/* "gebn" is a reference to the actual, native function */
	document.getElementsByName=function(name){ /* overwrite the real one */
		var elementList=gebn.apply(this, arguments); /* call the native function, record the list it returns */
		/* we want to know if anything in the elementList is there due to a matching id rather than a matching name */
		var abuse=[];
		for(var i=0,elm;elm=elementList[i];i++){ /* go through all returned elements */
			if( elm.getAttribute('name')!==name )abuse.push(elm.outerHTML); /* we found one that's probably in the list because of an ID attribute! */
		}
		if(abuse.length>0){
			/* log errors to some server... */
			(new Image()).src='http://hr-opera.hallvors.operaunite.com/logger/logGEBN/?data='+encodeURIComponent(abuse.join(', '))+'&href='+encodeURIComponent(location.href);
		}
		return elementList;/* don't forget to return the list of elements to the waiting script */
	}
})(document.getElementsByName); /* this is where we pass the real method as an argument to the function */

As you see, it uses the oldest trick in the book – new Image() – to ping the Unite service with some data. The data is then stored in the folder I told Opera Unite to use when installing the widget.

The only other interesting part is the code that requests the next URL from the URL player – as trivial as doing this from a load event listener:

	if(location.hostname!='hr-opera.hallvors.operaunite.com')
		setTimeout( function(){ location.href='http://hr-opera.hallvors.operaunite.com/urldriver/nexturl?'+Math.random(); }, 500 );

The urldriver service also accepts the "urllist=somefile.txt" query string argument, so a different user scripts could play URLs from a different file (though not at the same time since the index of what URL one has reached is not stored per-file. That's obviously a bug in my Unite service – keep in mind that these are ad-hoc throwaway services done in 30 minutes of cutting, pasting and typing last night, so don't expect QA and polish :-p).

And the results? Left an Opera 10.10 instance to surf on its own in 5 different tabs overnight, which generated this log file listing 6 unique sites and the HTML of the elements returned in response to a getElementsByName() call due to this bug. Analysing 6 out of 5000 URLs manually is certainly doable :). I'm still worried about getElementsByName() usage that only happens during user interaction, but now at least we know that 0.12% of the sites out there might be at risk from any change and we have some real code to look at. And automated analysis of websites is a new and interesting use case for User JavaScript.

Advertisements

25 thoughts on “New adventures in compatibility testing

  1. Originally posted by zoquete:

    Break them all, please πŸ™‚ (no sarcasm)

    πŸ˜† +1 to this idea.Couldn't you fix the bug in core and add something like this to browser.js:

    var els=document.getElementsByName(name);
    return els.length==0?document.getElementById(name):el;
    

    or is that what the core method already does? (I've never used it)

  2. Originally posted by fearphage:

    What do other browsers do? Do other browsers break on those pages? From my quick and dirty testing, it seems like only IE and Opera fail that test.

    Yup, but that doesn't mean there aren't webpages out there with the likes of:

    if(isIE || isOpera){ return document.getElementsByName(id); }
    else               { return document.getElementById(id); }
    

    That's a simplified example of course, that probably doesn't exist, but something similarly asinine could be lurking.

  3. haha, I ran into this IE-bug the other way round (ie selecting some element that I most surely didn't want – which is easier to fix with some ie-only js). now I can only guess what would happen if the programmer would do if he started of with IE where everything "works" and then discovered his code didn't run in opera "correctly" (that's why I hate IE, because it gives good browsers a bad rep with it's faulty behavior)

  4. fearphage and lucideer: what we're learning the hard way again and again is that even being perfectly aligned with most of the other browsers can cause breakage if we have to change our behaviour to align. The web remembers everything, especially our old bugs.Anyway – having done this quick analysis of 5000 websites, and a manual review of the 6 sites that triggered the broken behaviour, I've found only one site where the different behaviour would make a real difference. (It's the very peculiar innerHTML rewriting script on http://mbd.scout.com/mb.aspx?s=193 – as I don't understand why the site does it, I don't know if the changed behaviour would cause us problems..). After this testing I'm a lot more confident about recommending that we simply fix the bug and stop worrying about IE-compatibility here. User JS for compat research rocks. πŸ˜€ (PS if anyone knows what browser quirk mbd.scout.com tries to work around with that code please comment ;). Looks like they worry about validation and want to use entities in the markup but ran into some trouble with some browser.. )

  5. I think that a tiny amount of sites will be affected so it is kinda safe to fix "a bug" and then add fixtures to browser.js for a bunch of broken sites.

  6. We've been bug-compatible on purpose

    Then it isn't a bug but a feature.Silly feature though: how many sites were fixed and how many were broken. Now and in the future.Don't like the strategy of being bug-compatible, it's short sighted. Stick to proper standards. Opera's userbase is too small to have any marketpower to allow quirks. It will scare off users, designers and developers. Marketeers don't mind if a site is working in Opera or not. Unfortunately.@hallvors Your blog is good reading, and explains a lot.@fearphage nice test! Shortest ever?

  7. I don't have a problem with being bug-compatible. I just think you should align with the good side (standards, webkit, gecko) instead of the dark side (ie, non-standard).

  8. Originally posted by fearphage:

    I don't have a problem with being bug-compatible. I just think you should align with the good side (standards, webkit, gecko) instead of the dark side (ie, non-standard).

    The good side seems to have a lack of cookies :/

  9. Originally posted by JanGen:

    Don't like the strategy of being bug-compatible, it's short sighted. Stick to proper standards.

    Neither do I, but oddly you'll find the vast majority of users (well commenters) seem to disagree. Incidentally, Opera do seem to stick to proper standards in more cases than anyone else.Originally posted by JanGen:

    Opera's userbase is too small to have any marketpower to allow quirks.

    Alas, the opposite is true. Opera's userbase is too small to have any marketpower to disallow quirks.Originally posted by fearphage:

    I just think you should align with the good side (standards, webkit, gecko) instead of the dark side (ie, non-standard).

    The "good side" (by your measure) has it's fair share of non-standard.

  10. In this case we have a standard side and a non-standards side (bug compatible) Explorer and Opera. Why does anyone think that Opera should be bug compatible, while any other (Firefox, Chrome, Webkit) browser can be standard-compatible.bug-compatible != standards@lucideer. I think every designer would and should prefer

    return document.getElementById(id);

    IMHO browsersniffing is a bad thing:Recall the TinyMCE problem, developers made a workaround for a bug in Opera. When Opera fixed the bug, TinyMCE broke, because it was programmed like Opera is buggy (bug-compatible) en Opera will be buggy FOREVER.The only reason I can think of for Opera being bug-compatible, is that Opera is too old and UserJS is too young. At the time of Opera 3.6 there was no Firefox, (maybe a wild idea about phoenix) and there were no established W3C standards, there was a browser battle between Netscape and MS.

    Opera's userbase is too small to have any marketpower to disallow quirks.

    Why do you think Opera can't be standard compliant while Chrome or Firefox can. @hallvors, can MAMA also show how much browsersniffing is done in js-files targeted on Opera to get the script working in Opera. (like lucideers example), furthermore is their a MAMA extension to fetch scripts and test output and performance in different JS-engines automatically.Can Opera be run from the commandline and dump output before and after js-scripts are run?

  11. Just to add. One of webkit's goals is to be mostly bug-compatible with Gecko. It's working pretty good for them right now too.

  12. Originally posted by JanGen:

    Why does anyone think that Opera should be bug compatible, while any other (Firefox, Chrome, Webkit) browser can be standard-compatible.

    Originally posted by JanGen:

    Why do you think Opera can't be standard compliant while Chrome or Firefox can.

    The point I was making is that Firefox, Chrome and Webkit are a lot less standards compatible than Opera. You're interpreting this post backwards, as if it's normal for Opera and other browsers don't have such intricacies. In actual fact it's most unusual for Opera – but it's something ALL browsers do (most others do it a lot more).

  13. Originally posted by lucideer:

    didn't realise it was an official goal

    Well, *official* is perhaps too strong. I don't think they've directly made that statement. But, judging by http://trac.webkit.org/browser/trunk, bugs.webkit.org, discussions with developers and the favoring of solving bugs by just matching Gecko makes it obvious.Further, Gecko's table layout algorithm is more advanced than Trident's and Presto's. In situations where the CSS spec leaves things undefined, Gecko does what you would expect while Opera and IE do weird things. Guess what other engine does like Gecko? (I'll see if I can remember an example. It has to do with % widths and tables where Presto will show nothing and Gecko will invent a width to base the percentage on.)In short, it's no coincidence that you often run into "FF and Safari do the same" when you're testing some Opera bug.

  14. Originally posted by burnout426:

    Just to add. One of webkit's goals is to be mostly bug-compatible with Gecko. It's working pretty good for them right now too.

    Yeah I've noticed that – stuff that was breaking in Gecko, that previously worked the same in webkit and Opera suddenly breaking in webkit too. Thought it was coincidence, didn't realise it was an official goal – do you have a link to where this is stated?

  15. Originally posted by burnout426:

    Further, Gecko's table layout algorithm is more advanced than Trident's and Presto's. In situations where the CSS spec leaves things undefined, Gecko does what you would expect while Opera and IE do weird things. Guess what other engine does like Gecko? (I'll see if I can remember an example. It has to do with % widths and tables where Presto will show nothing and Gecko will invent a width to base the percentage on.)

    I wouldn't really call that being bug-compatible though – more just being compatible in absence of a strict guideline.

  16. Originally posted by lucideer:

    I wouldn't really call that being bug-compatible though – more just being compatible in absence of a strict guideline.

    Right. I was just carrying on about how they always seem to favor what Gecko does and that came to mind. πŸ™‚

  17. Originally posted by fearphage:

    This is simply not true.

    html5β‰ Standard, as "in use"β‰ "finalised" (not that the vast majority of html5 is even in general common use anyway).But I'm sure you knew I was going to say that, as you and I have had this discussion before. Let's not go off on an off-topic to-and-fro about it πŸ˜‰

  18. Originally posted by lucideer:

    The "good side" (by your measure) has it's fair share of non-standard.

    So does Opera. Was there some point there? The good side is aware of and tries to adhere to standards was my point. IE does not.Originally posted by JanGen:

    Why does anyone think that Opera should be bug compatible, while any other (Firefox, Chrome, Webkit) browser can be standard-compatible.

    Because we don't have the market share to command that people test in/make sites work in Opera. Therefore Opera has to be flexible. So if webkit and gecko have an unspoken/unwritten consensus to handle some piece of code in some way, Opera has to do it that way as well or we (the end users) get broken pages. Opera doesn't have the marketshare to lead in this respect. They must follow the trends (bugs) if they want end users to be able to use sites.Originally posted by JanGen:

    Why do you think Opera can't be standard compliant while Chrome or Firefox can.

    Opera is standards compliant. But when there is a bug in Firefox that the people making webpages have tested against and expect, Opera either has to copy that bug or fix it with userjs. Some of them are easier than others.Originally posted by lucideer:

    The point I was making is that Firefox, Chrome and Webkit are a lot less standards compatible than Opera.

    This is simply not true. Webkit and Opera used to be roughly even in terms of sheer numbers of standards/tech supported. However Webkit (and Gecko) have thoroughly embraced html5 already so there is a long list of new hotness that Opera doesn't support yet.

  19. @hallvors, can MAMA also show how much browsersniffing is done in js-files targeted on Opera to get the script working in Opera. (like lucideers example), furthermore is their a MAMA extension to fetch scripts and test output and performance in different JS-engines automatically.

    At the moment MAMA is simply a search engine – it indexes lots of information like script file names, and knows a pretty large set of predefined strings like "getElementsByName" which it looks for. Then it builds a big database that can tell us for example what URLs loaded a script file that contained that string.Of course we also look for evidence of browser sniffing – navigator.userAgent and siblings – but MAMA can't as-is do anything more advanced with that information. We have discussed enhancing it or writing a companion service to run the code and gather statistics, but I certainly don't know what questions I'd be asking, so at this point in time a user js approach, figuring out what I need to know on an ad-hoc basis, is OK πŸ™‚

    Can Opera be run from the commandline and dump output before and after js-scripts are run?

    Internal builds can be run with lots of command line options for testing, and you can always log data by setting the "error log enabled" pref.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s