What's so bad about Chinese orthography? 

Tons, just tons of things.  Working on a project for CHIN 481 today, I ran into some that absolutely frustrated me.  Take something as simple as word processing for example.  What is one thing that users of alphabetic scripts enjoy that users of character based scripts do not?  Spell-check.  You alphabetic script users out there, can you even comprehend the magnitude of this atrocity?  How can Chinese not have spell-check?  That is not the appropriate question to ask in this situation.  The question should be phrased 'Why does Chinese not have spell-check?'  The answer to this question can be found in the fundamental nature of Chinese orthography.  And so let's explore this topic in a little more detail.

First, what is orthography anyway?  Orthography, simply put, is the set of rules of how to properly write in a particular writing system.  For example in English, orthography consists of proper spelling, punctuation, capitalization, and word division just to name a few components.  Well, let's run through just these same four components with Chinese to see how it matches up.

Spelling: Chinese writing does not employ alphabets.  Instead there is a character script system that is limited in its ability to aid users in reproduction of the character from phonetic pronunciation.  That means there is a weak correlation - sometimes none at all - between sound and writing.  Therefore, users must commit to memory thousands of characters and associate a sound to it aided by weak indicator that might give clues as to how it is pronounced.  People generally employ two different methods when they are reading.  There direct access where they automatically recognize a word and its associated meaning based on its appearance.  Even for a phonetic system like English, the direct access method has been identified and proven.  Chinese in this sense is much the same way; however, when an English reading encounters a word she has not seen before, she usually relies on sounding out the word.  This is the phonetic approach which is unavailable to Chinese readers.  There are phonetic clues in characters, but there is no system or method to the madness.  It is a much costlier process just to be literate for character-based systems.

Punctuation: Chinese, believe it or not, did not have punctuation to begin with.  The concept of punctuation did not come about until modern times when it was borrowed from Western scripts.  Punctuation is such a powerful orthographic tool because it allows readers and writers to express context outside of words.  Without punctuation, readers are left to decipher the meaning of the writer.  The difference can be profound. For example take the sentence "Let's eat, Grandma."  When you take out the punctuation, you get "lets eat grandma."  Obviously, this sentence without punctuation can be taking in a vastly different way from the first.  A reader without the aid of punctuation suffers from the dearth of information contained in the words alone.  Obviously, the costs of deriving the meaning of individual sentences from context greatly outweighs the extra mental effort of designing and learning a system of punctuation.  Unfortunately, since the introduction of borrowed punctuation systems from the West, Chinese punctuation in common practice still remains largely ad hoc.  Some norms have arisen, but problems with standard usage are still extant.

Capitalization: There is no way to draw a fair comparison with alphabetic scripts in this category.  Capitalization does not exist in Chinese.  It's is an element better suited for alphabetic scripts.  I can't imagine developing one for Chinese.  We might just be better off underlining all our proper nouns and putting in periods at the stop of a sentence.

Word Division: Ah, words indeed!  The concept of "word" seems to be one that the Chinese people never truly fleshed out when developing their orthography.  When you write Chinese there is no word division.  In fact, the concept of 'word' itself is frighteningly weak.  It may be difficult for alphabetic script users to understand this, so here comes a detailed exploration.  Words are made of morphemes (the smallest unit of meaning).  Take a word like "teacher".  There is the morpheme "teach" (to educate) and the morpheme "er" ('one who does').  Morphemes come in two flavors in English: bound and unbound.  "Teach" is considered an unbound morpheme since it can function by itself.  "Er" is a bound morpheme because it cannot exist alone.  It must be appended to another morpheme to have meaning.  The parallel I would like to draw for is this:  Individual Chinese characters are morphemes like "teach" and "er."  The problem with Chinese orthography is that it never occurred to the Chinese that they should view "teach" and "er" not as separate units, but one functioning word.  There is an enormous propensity for Chinese character users to fixate on the morphemes and completely loose sight of the word that these units of meaning are trying to convey.  Just look at Chinese dictionaries for example, it is still possible to find individual entries for 'bounded' morphemes that provide a definition as if the morpheme could function alone.  Has anyone ever seen a dictionary entry for "er" in English?  This flaw is so fundamental that is has great implications for Chinese data processing as I will explain later, and it is the root cause of many misunderstanding of the writing system in general.

In summary, Chinese writing attempts to do with its characters something those users of alphabetic scripts have devised multiple systems to accomplish.  The characters therefore are overburdened with the need to contextualize ideas on top of expressing meaning.  The problems concerning character script do not end here.  Let's look at how good a job character script does on what it's meant to do: convey the meaning of words.

Phonetics: The alphabet gives clues to the way the word is read in almost a 1:1 mapping from letter to sound (Okay, so this is not entirely true due to the plethora of spelling variations that comes from borrowings from other languages like French and Spanish, but the entire system as a whole still is regular and predictable.)  Character scripts strike too low on the phonetic scale because only 85% of the characters provide some sort of phonetic clue*.  On top of that, even fewer give a precise phonetic pronunciation.  Some give the right sound, but the wrong tone.  Some give the right tone but a slightly off sound.  Some give neither the right tone nor the right sound.  It's an absolutely nightmarish guessing game.

Semantics: Chinese characters strike way to high on the semantic level in a way that is unnecessary, and it totally misses on compartmentalizing distinct units of thought like words, sentences, and paragraphs.  Since each character is so pregnant with meaning, there is a tendency to become over obsessed with the morphemes and not the words themselves.  This truly begat problems for development of Chinese grammar and computing system. 

So after all that, I would like to turn this discussion back to answer a question I posed in the beginning.  Why does Chinese not have spell-check?  There are a number of technical reasons that impede the development of Chinese word processing.  First, there is no manageable unit of logical, syntactic, or lexical analysis.  In alphabetic scripts, this unit of analysis is the word.  In Chinese it can be no different.  It has to be the word.  This is already being accepted de facto by the developers of Chinese word processing.  The idea of agreed upon, set standards for words will be the first step to developing a Chinese spell-check.  This step may be harder than it seems.  Due to the fixation on morphemes, "words" in Chinese as they stand remain largely ad hoc in their creation and usage.  Since people usually cobble together the appropriate morphemes to describe something (it gets worse the more technical the term), there may be many possible ways to say the same thing.  There needs to be some sort of standardization down to a set of agreed upon "words" for describing things.

Following the establishment of words, there needs to be a way of spatially identifying words.  In English, we can say "a word is something that appears between two spaces," and we would be right as far as being able to identify where a word starts and ends.  In Chinese, since there is not word division, word identification must be pulled from context.  There is no simple rule that a computer can follow like in English to pull out the word for analysis.  In short, we have some concept of "word" in Chinese, but our computers are too simple to identify them.  There are two ways to get around this issue.  First, we can program our computers so that they can perform the same functions as a highly trained newspaper editor.  Or, we can start writing Chinese with standardized word division.  Yes, READ: spaces in between Chinese words.  Is this a far fetched idea?  Why not writing Chinese with spaces?  Instead of trying to find a way to program our computers to lift words from every single sort of context, why not write Chinese with spaces?  Why not write: "women jintian yao qv naer?" (Where are we going today?) in characters with spaces instead of the "womenjintianyaoqvnaer?" that currently is the norm?

Perhaps what I am proposing just too "out there."  Maybe there is a better way, but I certainly hope that after all this that my readers can at least appreciate the idea.