Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

UTF-8 is a great system, but all those dreadful code pages existed because they were under different technical constraints.

Windows machines in 1990s had several megabytes of main memory, and people could barely get it to support one East Asian language at a time, never mind multiple of them. No sane person would propose using three bytes per a Korean character when two would do - that would mean your word processor will die after adding 50 pages of document, while your competitor can do 75.

And even if you did have UTF-8, you wouldn't see those Thai characters anyway, because who would even have these fonts when your OS must fit in a handful of stacked floppies.

It took years before UTF-8 made technical sense for most users.



Why would you keep 50 pages of document in main memory at once? It’s not like 75 is some magic limit that’s enough and 50 isn’t. No, if you stand any chance of getting anywhere near such a limit, you would certainly design your data structures so you don’t need all the content in memory at once, and so the difference is not so ferociously significant.


It's not like disk space was free and limitless, as was pointed out. Bytes in general used to be way way way more expensive and precious.


It wasn’t free and limitless, but it wasn’t scarce either—you probably had 100–1000× more disk space than RAM, which is close enough to unlimited for most text purposes. (https://en.wikipedia.org/wiki/History_of_hard_disk_drives suggests 1GB was typical in the mid-1990s.)

Consider also that at this very time we’re talking of (early 1990s) the industry was shifting away from largely 8-bit code pages to 16-bit UCS-2, which is an even more extreme cost when compared to UTF-8, doubling space requirements for most people, rather than merely the 50% increase yongjik speaks of for certain languages. Yet this change was being done (more’s the pity).

Concerning the scarcity of bytes, yongjik’s point would certainly be valid if it referred to the 1970s, was probably valid of the 1980s, but is not valid of the 1990s. (But the point about keeping the full document in RAM is an unrealistic strawman.)


No need to exaggerate the scarcity either. Documents were often produced in the Word 97 format, which can be easily be an order of magnitude larger than the underlying text. If the amount of bytes really was that important, any one of a number of more efficient formats could have been chosen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: