Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> a more-robust strict subset of the formal HTML spec

I still think we’d be better off just using XHTML. There are some practical problems with XHTML5 (e.g. there’s no named entities support for some reason), but the syntax makes sense at least.



That was tried 20 years ago and it turns out that humans are not good at writing XML.

XML makes sense if you are authoring HTML in an editor. However, this is not how most HTML is actually produced. It's mostly produced by templating engines. This means that you can't validate the XHTML during development because it's being generated on the fly. You only find out that it's valid in testing or production, perhaps only for a subset of users in certain situations. With HTML this is OK because there is error recovery. For XHTML you get literal downtime because the entire page shows a WSOD in the worst case.

Yes XHTML is okay as an internal tool, if for some reason your pipeline depends on parsing your own HTML then switching to XHTML internally could be a win. Just don't ship XHTML to browsers.


Surely a template engine would be able to produce valid (X)HTML?

Strict XHTML failed on the web because older browsers could not show it at all (since it used a different mime type) so nobody sane would use it. The problem wasn’t the strictness per se, the problem was how it was introduced without concern for backwards compatibility.

JavaScript is strict in the sense any syntax error will terminate execution. This seems to work fine because there is an incentive to make the syntax valid.

If XHTML was introduced in a backwards compatible way but new features (like canvas) only worked in strict mode, I’m sure it would have caught on. The incentives just have to be there.


IE6’s refusal to display any page served with the XHTML MIME type was certainly the main reason nobody deployed real XHTML, but the overstrictness was not far behind. Hard enough to justify a complete rewrite of your website’s HTML; even harder when any encoding error or tag imbalance generated by your CMS would display the yellow screen of death rather than a best guess or even displaying everything up to the error:

https://commons.wikimedia.org/wiki/File:Yellow_screen_of_dea...


If there was an actual benefit to using XHTML I’m sure CMS’s would be updated to support it. It is not like it is an impossible problem to produce syntactically valid JSON or SVG for example.

As “use strict” in JavaScript shows, it is possible to introduce stricter parsing of an existing format, as long as it is explicit opt-in and existing content is unaffected.


I think the main problem with CMSes supporting XHTML would be that basically every single one uses template engine that treats HTML as a string of characters.

Is there a templating system that’s easy to use (think Jinja or something Svelte-like), but parses templates as XML instead of just concatenating a bunch of strings?


I think if XHTML was pushed forward, the second problem would be swiftly solved: We'd have a lot more systems that considered webpages as XML documents rather than just templated text. And text-based systems could easily validate their XHTML output and report failures quickly, as opposed to now where you get a broken page and have to specifically look if your HTML isn't malformed.


For better or worse XHTML, also known as the XML serialization of HTML, cannot represent all valid HTML documents. HTML and XML are different languages with vastly different rules, and it's fairly moot now to consider replacing them.

Many of the "problems" with HTML are still handled adequately simply by using a spec-compliant parser instead of regular expressions, string functions, or attempting to parse HTML with XML parsers like PHP's `DOMDocument`.

Every major browser engine and every spec-compliant parser interprets any given HTML document in the same prescribed deterministic way. HTML parsers are't "loose" or "forgiving" - they simply have fully-defined behavior in the presence of errors.

This turned out to be a good thing because people tend to prefer being able to read _most_ of a document when _some_ errors are present. The "draconian error handling" made software easier to write, but largely deals with errors by pretending they can't exist.


Typescript+JSX is what XHTML wanted to be.


Clearly not the case as the point of a data language is to free you to pick a programming language to produce it, and a specification to allow agreement without a specific implementation in a particular language.


That’s exactly what happened. We write JSX which gets compiled down to assembly, excuse me, html5 or xhtml or whatever. Fine by me as long as we accept that writing it by hand is not what engineering time should be spent on in overwhelming majority of cases.

(I’d also like a word with yaml while we’re at it…)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: