I'm of a few different minds on this. First off, what needs to be acknowledged i...

zozbot234 · on Feb 18, 2022

Unicode encoding was far from universally adopted in the early 2000s. Some legacy content will always exist, and the moment where dropping the feature is harmless is also the moment where there's no reason to drop it in the first place, since it only ever comes into play with broken legacy content; otherwise, it is entirely hidden and has no harmful effect whatsoever.

jcranmer · on Feb 18, 2022

Unicode encoding isn't what I was referring to--it was the ability to specify <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> (copied from HTML 4.01 specification, dated December 24, 1999). That might not be the first time it was possible to declare the charset of an HTML document within HTML itself, but I don't see any mention of it in HTML 3.2.

Also, it's wrong to say that there's "no harmful effect whatsoever"--the ability to do charset overrides requires code to support it, and that code could force awkward compromises in (say) your HTTP layer. All features have costs, even if they're invisible.

serentty · on Feb 18, 2022

I also would like to point out in the context of the issue of “legacy” content that not all web content is some sort of single-page application that gets maintained. Of course, we don’t expect software to remain compatible with other software forever, but not all websites should be considered software. Consider a video file, for example. We have the expectation that a video file, which is static content with no scripting, will continue to play forever in future media players, or at least that you will be asked to download some obscure codec pack if it doesn’t play by default. I think that a static website with no scripting and just text is the same. A website with just text should continue to work. I think this is pretty different from something like Flash or ActiveX or even cookies, where we are no longer talking about static content.

zerocrates · on Feb 18, 2022

Taking your video example it also just shows that browsers are different: there's tons of video content out there on the web that browsers can no longer play, since they relied on plugins to play and plugins are dead, like RealVideo, WMV, MPEG-1/2/4, etc. As always, ffmpeg is a treasure, though.

There's a whole world of patents that make video a particularly problematic space for browsers, but just the basic philosophy of continuing to work with old static content forever isn't that strongly held (and really, isn't that strongly held in much of software: consider opening really old Word or WordPerfect documents just as an example).

wolverine876 · on Feb 18, 2022

> this change does somehow still feel premature

Especially in this heated sort of discussion, I think we need to know more than 'feels premature' or overripe or whatever. What about the data they have? The developer's research is linked in this discussion somewhere.

And of course we are deep in a bubble. Almost no end user knows what character encoding is, and few have any hope of fixing the problem manually. In fact, calling the menu item 'Repair Character Encoding' (or whatever they chose) is probably poor UI - you need something that end users will understand, more like 'Repair Gibberish Text'.