Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This discussion wouldn't be complete without a mention of https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_..., which AIUI was initially constructed as an argument against Romanization.

In short, it's the same nominal sound with varying tones ("shi", which is closer in pronunciation to "shirr" than "she"), repeated about a hundred times, which is of course meaningless in spoken form (since there's not enough context to differentiate between the various forms), but actually conveys a story in written form.

With the shift toward typing and (especially mobile) computerization in the recent era, it's really not surprising (to me, at least) that Chinese society is moving in a direction where literacy no longer extends to recall of individual characters, and only encompasses recognition, since recall is no longer as necessary of a skill in day-to-day life.



The poem is written in Classical Chinese, which was spoken over 2000 years ago, and back then would have been intelligible to a listener because the words would have sounded different. Even today, they sound different in e.g. Cantonese.

There's a close relative of Mandarin (Dungan) which is written in the Cyrillic alphabet. The spoken language is tonal, but tones aren't used in the written language because written words are polysyllabic, and if you know how to speak Dungan, you can reliably infer the tones.

https://www.omniglot.com/chinese/dungan.htm


The poem uses now-rare characters from classical Chinese but it was written in the 1930s and uses the modern Mandarin pronunciation of said characters. The whole point of the poem is to make everything "shi" in modern Mandarin pronunciation, to argue against switching from Chinese characters to Latin alphabet romanization.


You can also construct ridiculous sentences in English that no native speaker will understand [0].

In normal texts written in modern Chinese, this is not a problem. Nobody writes real texts like the "shi" poem. In cases where something can only be understood in written form, you can rephrase it to avoid homophones.

0. https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal...


It would result in a pretty severe loss of fidelity.

You may think it’s not needed, because that information isn’t available in spoken Chinese. The same is true for written English - putting spaces between words, dividing texts into paragraphs, capitalizing them, differentiating between different pauses (a comma, period, semicolon, etc. all signifying what kind of pause something its), quotation marks, parenthesis, etc. - none of this is available in our spoken language, and we’re still able to understand it. In theory, we could get rid of them all and understand what’s being written. In practice, most people would find the result to be an incomprehensible mess.

The same goes for Chinese. Written languages, for the most part, are more than a simple transcription of spoken sounds.


> In practice, most people would find the result to be an incomprehensible mess.

Unless Chinese is somehow unique among all human languages, this isn't true. Chinese would be just as intelligible if written in a phonetic script (like Pinyin) as it is when written using the characters.

Now, it would be an incredibly shocking transition for Chinese people who have already spent their entire lives writing with characters. However, after the transition to Pinyin, especially for young people who wouldn't ever learn the characters, written Chinese would still be perfectly understandable.

That being said, I don't favor replacing the characters, because the transition would be extremely difficult and because the characters are very culturally important to China. They've been in use for a good 3000 years, and people are very attached to them. Phonetic scripts are technically superior, but the cultural and practical arguments for sticking with the characters are still stronger.


> Unless Chinese is somehow unique among all human languages, this isn't true.

I was talking about English in that paragraph:

> The same is true for written English - putting spaces between words, dividing texts into paragraphs, capitalizing them, differentiating between different pauses (a comma, period, semicolon, etc. all signifying what kind of pause something its), quotation marks, parenthesis, etc. - none of this is available in our spoken language, and we’re still able to understand it. In theory, we could get rid of them all and understand what’s being written. In practice, most people would find the result to be an incomprehensible mess.


> I was talking about English in that paragraph:

The very next sentence you wrote was

> The same goes for Chinese.

So you were talking about both English and Chinese in that sentence.


> So you were talking about both English and Chinese in that sentence.

I was talking about English in the sentence you quoted. In the next paragraph, I said that Chinese was the same as English in this regard. That's why I couldn't (and still can't) understand your comment.

You're saying it isn't true that removing those parts of English would mean "most people would find the result to be an incomprehensible mess" unless Chinese is unique? Chinese has absolutely no connection to written English becoming a mess after removing those elements of written English.

Or are you objecting to the paragraph after the one you quoted, where I say the same thing that happens in English is true for Chinese? "Unless Chinese is somehow unique among all human languages, this isn't true" that Chinese would be like English? That doesn't make any sense to me unless you misread my initial comment to mean the complete opposite of what it was saying.


It's very clear what you meant, and I don't know why you're going in circles like this.

You very clearly wrote that Chinese would become an incomprehensible mess if written in Pinyin.

You first stated that there would be a severe loss in fidelity in switching to Pinyin. Then you gave an analogy showing how removing various non-phonetic elements of written English would make it an incomprehensible mess. Immediately after that, you said that the same applies for Chinese.

I'm objecting to your argument that Chinese would be an incomprehensible mess if written alphabetically.


No, I'm genuinely confused by your claim that in order for Chinese to be similar to English in this manner, it would be "somehow unique among all human languages." These are contradictory ideas. That's why I was asking for clarity.

> I'm objecting to your argument that Chinese would be an incomprehensible mess if written alphabetically.

That's fine, but it runs directly counter to your initial comment. If a phonetic transcription would make Chinese just as easy to understand as it is written now, it would be quite different from English, and almost every other written language, all of which include non-phonetic elements in order to facilitate reading.


I'm not sure what's confusing you. You laid out your initial argument clearly. I laid out my response clearly.

Now, you're obsessing over some pretty obvious misinterpretations of what I've written, and you're ignoring the argument you yourself initially made.

> If a phonetic transcription would make Chinese just as easy to understand as it is written now, it would be quite different from English, and almost every other written language, all of which include non-phonetic elements in order to facilitate reading

Pinyin, the phonetic transcription of Standard Chinese, is written with spaces and punctuation. You're going on about something that doesn't exist.


In normal texts, that's correct. However, written Chinese does contains semantic information which the spoken language and Pinyin lack and, unlike English, has fewer distinct syllables, and seldom borrows words from other languages. So someone who's literate in Chinese would usually be able to infer the meaning of unfamiliar words when written down, as they would already know the meaning of all their component characters, but might struggle if they were written phonetically. This is like having a good knowledge of Classical Greek when encountering words like nephropathy or myocarditis for the first time.

It still isn't a very good argument, though. Most English speakers get by without any knowledge of classical languages, and accept having to look up words in a dictionary.


Someone who's literate in Chinese would only be able to infer the meaning of an unfamiliar character if they already knew all of the surrounding characters. Then, you can guess the meaning of the character based on context, and possibly hints from the character itself about pronunciation and/or meaning (though this is very hit-or-miss, because many characters don't contain obvious hints). In order to reliably know all of the context surrounding a character, you need to know about 3000 characters total (that's the point at which you can recognize 99% of characters on a page). This is still a very tall order, which takes years of study to achieve.

The Chinese characters do indeed contain semantic information that Pinyin (the standard Romanization) does not, but in practice, you don't need that extra semantic information. If you write down a single word in Pinyin, it may have a few homophones, whereas the same word, written in Chinese characters, would be unambiguous. However, in written Pinyin texts, you would almost always be able to figure out which word is meant from context. In the few cases in which that would not be possible, the author could slightly rephrase the text to make it unambiguous.

Most languages on Earth (that have a writing system) are written using alphabets. Chinese is not so special that it could not be written using an alphabet as well. The reason why China hasn't switched to an alphabetic script is because of cultural attachment to the script, not because the Pinyin doesn't work just as well in a practical sense.


> Someone who's literate in Chinese would only be able to infer the meaning of an unfamiliar character if they already knew all of the surrounding characters.

In what I wrote, I was assuming there would be no unfamiliar characters, but there would be one or more unfamiliar words composed of two or more characters.

I was trying to put forward the best argument I could think of for retaining the characters, but like you, have decided it isn't worth the additional effort of learning thousands of characters up front to become literate when you can use a phonetic script and look up any unfamiliar words in a dictionary instead.


The argument isn't that the more complicated spelling is unlearnable, but that it could be much easier to learn.

And yes, this is also 100% applicable to English.


This argument is also used for Japanese, but I do not consider it valid.

This just proves that a phonetic writing is not sufficient, but it does not mean that the phonetic writing must be replaced with traditional writing.

To resolve the ambiguity of the phonetic writing, both in Chinese and in Japanese, where the ambiguity is much worse, it is enough to retain at most a couple hundred symbols to be used as semantic classifiers. It is likely that a great part of the traditional radicals would be suitable to be retained as classifiers, with perhaps a part of them omitted if redundant and a few other symbols added, if necessary.

Then the writing could be phonetic, but with classifier symbols attached to words, wherever the ambiguity makes them necessary.

This is not a new method. The oldest writing systems, like those of Egypt or Mesopotamia, also used classifier symbols (with meanings like: "a kind of human", "a kind of god", "a kind of animal", "a kind of stone", "a kind of wood", "a body part", "a kind of tool" and so on) attached to the words written phonetically, to avoid ambiguities.

If one would have to learn only 200 classifier symbols and with lower stroke counts than most symbols used now, that would be a great simplification.

Many of the Chinese characters are actually intended to be composed of two parts, a semantic classifier and a phonetic symbol, but this principle is applied too inconsistently and with too many variants, so the system can be greatly simplified by using a simple phonetic writing like Pinyin together with semantic classifiers inserted in the text only if they are necessary.


I think ambiguous homophones aren't actually much of a problem. There's usually only correct option that matches the surrounding context, so the correct inference is easy to make even with no characters at all . After all, there aren't subtitles when you're talking to other people, all the homophones still exist, and yet communication doesn't seem to be impeded.


> Many of the Chinese characters are actually intended to be composed of two parts […]

That is not entirely true in the case of Mandarin, but it is more true in the case of Cantonese (and a few other Chinese languages).

Owing to the historical loss of sounds (especially finals) over the course of the Mandarin development, many Mandarin words tend to be longer (3-4 syllables are common) compared to their counterparts in, say, Cantonese where they are most of the time (but not always) are two syllables long due to the fact that Cantonese has retained more sounds from Middle Chinese (plus, the intermingling with the Bat Yue) over the course of its development.

Which is why the «Lion eating poet in the stone den» still makes some sense when read out loud in Cantonese (also in Wu, Min) and makes no sense in Mandarin.


Thanks for the interesting link! Nitpicking a bit, but if I understand this page (linked from the wikipedia article? see point 3)

https://pinyin.info/readings/zyg/what_pinyin_is_not.html

correctly however, the text was not meant as an argument against romanization but as a playful example of how pinyin are unfit for classical rather than modern vernacular chinese.


I'd accept that interpretation. To be more precise, I view it as a demonstration of information loss from replacing classical characters entirely with romanization, as opposed to a forceful argument against any form of adoption of romanization.


> Lion-Eating Poet in the Stone Den

Sounds like Buffalo buffalo, but it's more like someone being clever than pointing out an actual problem with the language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: