Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm of a few different minds on this.

First off, what needs to be acknowledged is that any website that requires you to override the charset is broken and the greatest part of your ire should be directed at the people who write and operate such websites for expecting that everybody else should be responsible for cleaning up their messes. Let's also acknowledge that avoiding this breakage has been possible for (checks notes) 22 years.

Since this is very legacy stuff, at what point can we say that it is no longer the end consumer's job to deal with legacy cruft? The motivation for why a user-level override was ever necessary in the first place--the fact that once upon a time computers couldn't simultaneously represent content in different locales--is quite ancient history. I don't think an answer of "never" is compelling: why then aren't we complaining that browsers don't let us select EBCDIC or UTF-7 as an encoding (both of which were once supported, but have been dropped)?

At the same time, this change does somehow still feel premature; dropping the ability to override charset encoding does feel very "WTF?" to me. But... if that's the case, then what can be done to hasten the moment where we live in a world where dropping this feature doesn't feel like that? (This makes me glad I work in compilers, where no one attacks you for deciding that some inputs are just too wrong to attempt to deal with it.)



Unicode encoding was far from universally adopted in the early 2000s. Some legacy content will always exist, and the moment where dropping the feature is harmless is also the moment where there's no reason to drop it in the first place, since it only ever comes into play with broken legacy content; otherwise, it is entirely hidden and has no harmful effect whatsoever.


Unicode encoding isn't what I was referring to--it was the ability to specify <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> (copied from HTML 4.01 specification, dated December 24, 1999). That might not be the first time it was possible to declare the charset of an HTML document within HTML itself, but I don't see any mention of it in HTML 3.2.

Also, it's wrong to say that there's "no harmful effect whatsoever"--the ability to do charset overrides requires code to support it, and that code could force awkward compromises in (say) your HTTP layer. All features have costs, even if they're invisible.


I also would like to point out in the context of the issue of “legacy” content that not all web content is some sort of single-page application that gets maintained. Of course, we don’t expect software to remain compatible with other software forever, but not all websites should be considered software. Consider a video file, for example. We have the expectation that a video file, which is static content with no scripting, will continue to play forever in future media players, or at least that you will be asked to download some obscure codec pack if it doesn’t play by default. I think that a static website with no scripting and just text is the same. A website with just text should continue to work. I think this is pretty different from something like Flash or ActiveX or even cookies, where we are no longer talking about static content.


Taking your video example it also just shows that browsers are different: there's tons of video content out there on the web that browsers can no longer play, since they relied on plugins to play and plugins are dead, like RealVideo, WMV, MPEG-1/2/4, etc. As always, ffmpeg is a treasure, though.

There's a whole world of patents that make video a particularly problematic space for browsers, but just the basic philosophy of continuing to work with old static content forever isn't that strongly held (and really, isn't that strongly held in much of software: consider opening really old Word or WordPerfect documents just as an example).


> this change does somehow still feel premature

Especially in this heated sort of discussion, I think we need to know more than 'feels premature' or overripe or whatever. What about the data they have? The developer's research is linked in this discussion somewhere.

And of course we are deep in a bubble. Almost no end user knows what character encoding is, and few have any hope of fixing the problem manually. In fact, calling the menu item 'Repair Character Encoding' (or whatever they chose) is probably poor UI - you need something that end users will understand, more like 'Repair Gibberish Text'.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: