HTML5 and invalid documents – the great misunderstanding

People keep complaining about HTML5's error handling. It looks like a lot of people believe that because the standard includes error handling, all content will be considered valid.

This statement is wrong and repeating it doesn't make it true. Yet even Sir Tim Berners-Lee himself seems to express a concern that HTML5 represents

changes of philosophy about improving the web as opposed to letting it fester while describing it.

This is probably the greatest misunderstanding about HTML5. Let's get this straight..

  • Understanding error handling is an absolute requirement for improving HTML and the Web while being compatible with current content.
  • Invalid documents are still invalid.
  • HTML5 browsers will not "gloss over" invalidity any more than the current HTML4 browsers already do.

On the contrary, I believe that the level of detail in HTML5's error handling will make browsers and validators report more useful error messages. This will make it easier to write valid HTML.

Look at the spec. Right now I find 178 instances of the expression "parse error" in the spec text. These parse errors are validity errors that validators will and browsers may report to the user. (The spec can't dictate browsers to do so because it's a UI decision how to do it, but I'm fairly sure that Firefox, Safari and Opera will all use their existing error consoles / web developer tools to show HTML5 parse errors. After all, these errors should be so useful it would be a competitive drawback for a developer tool not to show them).

Having web browsers and validators report the same errors will help authors understand HTML and well-formedness. Today, authors who try to use the validator are baffled when the validator says a document has lots of problems, yet it works fine in browsers and they don't complain about errors. This confuses authors and makes them distrust or ignore the validator warnings.

Tomorrow, HTML5-compliant validators and browsers will report the same errors, and HTML authors will be less confused and more enlightened as a result. Hence, specifying error handling with the detail the HTML5 spec is doing should in fact contribute to improving the quality of the markup out there on the web.

12 thoughts on “HTML5 and invalid documents – the great misunderstanding

  1. I think that's the second-greatest amount of strong emphasis I've ever seen! :DWell said nevertheless, Hallvord: it can do nothing but good for error handling to be interoperable. For one thing, you really need a consistent DOM, and uniform parsing rules would give you this (or at the very least grounds for calling out divergent behaviour as squarely buggy). We needn't a better reason in my book.

  2. That was the sense of one of my recent blog posts: Alexa Global Top 500 against HTML 5 validationhttp://www.w3.org/QA/2008/09/top-500-html5-validityI was showing that less sites were valid with html 5 than with html 4, because the content model is stricter in fact. This being a good thing or not is still to discuss. Henri Sivonen thinks that it is not a very good move to make less valid documents from author perspective. The size and technicality of HTML 5 specification makes the content model hard to parse and understand. It will change with the creation of Web Developers Guidelines. Another think that would promote good karma for HTML 5 is a tidying serializer. It doesn't exist yet and there is no defined algorithm for it. Unfortunately.ps: the commenting system doesn't work without cookies. And the UI is in Spanish though languages are French, English, and Japanese for me.– Karl Dubost, W3C

  3. In my opinion error handling should be a big red error screen clearly describing the error INSTEAD of the actual content.Reasons:1. It's the only way to get rid of the current 90%+ of non-standard websites (read: problem in a browser or another)2. It's the way it should have been done from the beginning so we wouldn't end up in this messy situation.3. It's the only sensible way for a new standard in the current situation (when we have so many broken websites – and I consider it broken if it only works properly in one browser).I mean, why do we need backwards compatibility for a NEW standard?Those who don't like to write correct code, should just use a HTML 4.1 transitional DTD and be done with it!The fact that there is an error _recovery_ process makes me sad.That's why I like XHTML so much (when served as application/xml).And about the console… you probably know that after a brief period of browsing the console gets hundreds of error messages.No one cares about the console anymore…

  4. In my opinion error handling should be a big red error screen clearly describing the error INSTEAD of the actual content

    Cool, more power to you: application/xhtml+xml is your friend.

    No one cares about the console anymore…

    If you're testing your own website(s) you certainly should.

  5. Anonymous writes:

    >> In my opinion error handling should be a big red error screen clearly >> describing the error INSTEAD of the actual contentWell said!

  6. Originally posted by MisterE:

    No one cares about the console anymore

    Complete and total lie. I care about the error console and more importantly I care that the console is often less than helpful in opera… although this is due to the js engine's error reporting and not a fault of the console itself.

  7. Originally posted by "fearphage":

    Originally posted by "MisterE":

    No one cares about the console anymore

    Complete and total lie.

    I was referring to the actual users.If the users don't cry "website doesn't work" the devs won't fix it.And the users won't cry if there are errors in the console, they don't even know about it.As a user, I would like the browsers to be less forgiving.As a developer, I don't care.I use Strict XHTML served as application/xhtml+xml.

  8. If the users don't cry "website doesn't work" the devs won't fix it.

    Error messages are for developers, not end users.I don't believe in hurting the users in our quest for improved coding quality on the web. If you believe that average users will start sending informed complaints to the websites (as opposed to just becoming confused and insecure if error messages appear), I suggest you broaden your social circles and get to know some less computer literate computer users.

  9. Error messages are for developers, not end users.

    Exactly!If there are errors, devs will fix them before the users have any chance of seeing them, instead of ignoring the errors.The potential shame will be enough of an incentive for devs to fix their websites.And of course, all this is opt-in by specifying the proper DTD.I mean, if you promise something (DTD declaration) you should be held accountable (big error screen) when you haven't kept your promise.I think this is the reason people keep complaining about HTML5's error handling.I'm one of them (after reading your article :)).But I consider it a small issue anyway, so…

  10. > In my opinion error handling should be a big red error screen clearly describing the error INSTEAD of the actual content.Simply: no.First: why would you want to submit your end-users to error-messages they don't understand? You are saying that the developers will catch these errors before submitting changes, but is that always true? Most of the times pages are constructed dynamically and may contain external content (such as ads or user-submitted content). Can you be sure that external content is always well-formed? Are you also sure that your own CMS always generates well-formed content (does it really go through a validating parser)? Do you really parse and validate user-submitted content?Secondly: considering that there are more invalid pages on the web than valid ones, and a browser that will do error-correction on invalid pages and show them nonetheless (using well-defined error-correction rules), and a browser that will just display an error when it encounters an invalid page; which browser will be most popular by users?

  11. One of the thing the market place is trying to teach us – from my point of view – is that having lenient error handling is such a competitive advantage that browsers will have to support it somehow, no matter what the specs say. Believe me on this point, even if it might surprise you: this is also true for languages where we tried to do it "right" and be draconian from the start! XML-based languages like XHTML and WML have plenty of broken legacy content because certain (mainly mobile) browsers figured out they had a competitive advantage if they let webmasters get away with writing ampersands without the amp; and similar small validation errors. In such marketplaces you aren't competitive if you keep showing end users the yellow screen of death – even if you're technically FAR superior on all other points. For other applications than browsing (where correctness of information is a greater concern) draconian may be the right choice, but among browsers draconian error handling will over time be even more marginalized than it already is.

  12. Anonymous writes:

    If we could just get Google to give a lower ranking to pages that doesn't validate according to their DTD, I believe we would get a long way in terms of getting more well-formed and accessible code.About the read screen. Any coder that leaves the user with the problem resulting from bad coding should get out of business, and that in my POV also includes those who codes browsers.It is really our problem to secure that anything we code always fall back to a functional state, no matter what the user does.

Leave a reply to MisterE Cancel reply