Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
HTML Tags (1991) (webdesignmuseum.org)
10 points by mmoez on Nov 8, 2019 | hide | past | favorite | 19 comments


Linked original doc is at [1].

Interesting tidbits:

> <H1>, <H2>, <H3>, <H4>, <H5>, <H6> > > These tags are kept as defined in the CERN SGML guide. Their definition is completely historical, deriving from the AAP tag set.

This probably refers to [2].

Oh, and

> ...(not good SGML)...: <NEXTID 27>

This is in fact no SGML at all (NoSGML?), because SGML attribute minimization allows to leave out the attribute name if it can be uniquely identified using an enumerated token value but not arbitrary numbers eg. the following is valid:

    <!ELEMENT e - - ANY>
    <!ATTLIST e myatt (true|false) #IMPLIED>
    ...
    <e true>
and short form for

    <e myatt="true">
[1]: https://www.w3.org/History/19921103-hypertext/hypertext/WWW/...

[2]: https://en.wikipedia.org/wiki/SGMLguid


HTML DTD was made later and bugs were fixed in the first first standard proposal in 1993.

Real world HTML in the wild frequently failed to validate, so the fact that HTML was SGML application became irrelevant almost instantly. Nobody parsed HTML using SGML parsers with DTD.


The official W3C validator site [1] begs to differ. And I am, in fact, parsing lots of HTML5 using SGML (see eg. [2], prepared for an ACM DocEng 2019 workshop with a focus on preserving and aquiring HTML5 corpora into document engineering and ML approaches for search, text extraction/summarization, etc.)

[1]: https://validator.w3.org/

[2]: http://sgmljs.net/docs/sgml-html-tutorial.html


The official W3C validator site seems to agree with me. Did you misunderstand what I said or can show me wrong. Just feed to it any widely used webpage, for example https://news.ycombinator.com/ and it will not pass.

Just be be clear, just because there are still uses for SGML does not make it relevant in the big picture. Your use case seems to be the exception.


Don't know what you did exactly, but the official W3C validator site uses 20 year old DTDs for DTD-based validation, but then HN's markup uses presentational elements/attributes from the HTML4 transitional/loose era intended to ease migration to CSS back then. The errors show exactly what's wrong with HN's markup eg. missing "alt" attribute on images where required, use of long-obsolete elements, missing DOCTYPE, etc. so I guess it's working as expected in suggesting improvements to your site's markup, doesn't it?

FYI: if you want to parse modern HTML 5 using SGML (with my HTML5 "mini"-DTD), see [1]. For example to check the HN homepage, download it using curl, then add a DOCTYPE to it ('<!DOCTYPE html SYSTEM "about:legacy-compat">'), then invoke "sgmlproc" on it, and it'll just work and parse without errors (see downloads and instructions on linked page).

[1]: http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-tu...


Yes, but that is not relevant to my argument. Validator validating is irrelevant. HN's markup is not wrong because it works. You use sgmljs to deal with the unnecessary mess that SGML/HTML/XML started.

ps. Since you seem to know this stuff, where I can find standard DTD for DTD before XML. DTD was defined using DTD, right?


Not sure what you're after exactly but DTDs were introduced with SGML (ISO 8879:1986 [0]) and then used in simplified form with XML (which is specified as a simplified profile of SGML [1]).

The (historic) SGML-DTDs for HTML, including those used by W3C's validator and early IETF DTDs for HTML 2.0, can be found at W3C's site eg [2], [3].

[0]: https://www.iso.org/standard/16387.html

[1]: https://www.w3.org/TR/REC-xml/

[2]: https://www.w3.org/TR/html4/sgml/dtd.html

[3]: https://www.w3.org/TR/2018/SPSD-html32-20180315/


My question is this: Is there standard SGML-DTD for DTD? I have no access to ISO 8879:1986, so I can't check it.


Not really. SGML (and XML) are "meta-markup languages", meaning you declare your vocabulary yourself or use a ready-made one. There is in fact a simple general-purpose vocabulary declared in an ISO/IEC 8879:1986 appendix consisting of generic paragraph and heading elements, but it's not widely used in that form.


This gets close to my point.

Even people working with the standard don't want or don't need to SGML. Similarly for CSS.


BTW. DTD valid HTML document can still violate specifications.

We have a situation where

1. non valid HTML is just fine because HTML parsers recognize informal superset, and 2. HTML validated against corresponding DTD can violate specification

We have situation where parser/validator is at the same time not enough and too much.


The validator existed, sure, but finding a page that validated was like finding a unicorn.


Well the HN home page doesn't validate in the experimental HTML 5 validator either ;). The validator's point isn't to cover the largest set of documents on the web out there (you could use my "mini"-DTDs for that) but to inform authors about less ideal markup (as in "HTML recommendation").


> The design of the first version of HTML language was influenced by the SGML universal markup language.

HTML was designed as an application of SGML. Just like JSON-RPC is application of JSON. HTML has DTD (SGML Document Type Definition). HTML was technically SGML application until HTML5.

SGML comes from Latin and means "complex solution to simple problem."


SGML isn't that complex. If you know the XML subset of SGML, there are only a few additional concepts to learn (mostly markup minimization which is designed to greatly simplify the directly authored form of a text document such that you can write markdown-like syntax, with the canonical/internal form being exactly the same as an XML parser would see it). I'll give you that the official ISO standard spec sucks to the point of being incomprehensible; but then most markup-related specs, including the HTML 5 spec, do. This is what Eliot Kimber (or was it another HyTime editor?) has to say about it (on an admittedly not so well-known topic even by markup standards):

> Why can't people understand the SGML Extended Facilities as written and as standardized by the ISO?

> ISO standards are very hard to understand because they describe very technical things in an abstruse techno-legal vocabulary and reduced-redundancy style. In short, despite having great things to say, even the deathless prose of the HyTime standard tends to be unreadable and, quite frankly, to suck as informative literature. (I'm a co-editor of it; may God have mercy on us.)


> SGML isn't that complex.

It's unnecessarily complex. The complexity and the features it has have no purpose once you step away and look at the big picture. People don't want to use it. It's easier to write your own dataformat than use and learn SGML. XML was a move away. HTML5 was move away. Heck, just microXML is enough https://blog.jclark.com/2010/12/more-on-microxml.html

Starting from scratch something like s-expressions or JSON would have been better starting point than SGML/XML/HTML.


> HTML was designed as an application of SGML.

No, it wasn't. It may have gone from being influenced by SGML to being defined as an application of SGML by the time of the first spec, but TimBL didn't actually originally use SGML for it due to implementation complexity of SGML.


TBL might not have used SGML tools, but surely wanted to make HTML extend to SGML proper and appeals to SGML concepts at several places in the linked document:

> Currently HTML documents are transmitted without the normal SGML framing tags, but if these are included parsers will ignore them.

> In SGML terms, paragraph elements are transmitted in minimised form

> These tags are kept as defined in the CERN SGML guide. Their definition is completely historical, deriving from the AAP tag set.

> (not good SGML)


A few months ago I some high school kids job shadowing me to see if they wanted to get into development. I was showing the Web dev part of my job and was asked if I went to college for that. After thinking about it I realized the img tag was suggested my sophomore year of college and formally accepted my senior year. [0] So: A) no, I couldn’t have. B) I’m old.

[0] https://thehistoryoftheweb.com/the-origin-of-the-img-tag/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: