|
What's so
bad about Chinese orthography?
Tons, just tons of things.
Working on a project for CHIN 481 today, I ran into some that
absolutely frustrated me. Take something as simple as word
processing for example. What is one thing that users of
alphabetic scripts enjoy that users of character based scripts
do not? Spell-check. You alphabetic script users out there,
can you even comprehend the magnitude of this atrocity? How can
Chinese not have spell-check? That is not the appropriate
question to ask in this situation. The question should be
phrased 'Why does Chinese not have spell-check?' The
answer to this question can be found in the fundamental nature of
Chinese orthography. And so let's explore this topic in a
little more detail.
First, what is orthography
anyway? Orthography, simply put, is the set of rules of how to properly
write in a particular writing system. For example in English, orthography
consists of proper spelling, punctuation, capitalization, and
word division just to name a few components. Well, let's
run through just these same four components with Chinese to see
how it matches up.
Spelling:
Chinese writing does not employ alphabets. Instead there
is a character script system that is limited in its ability
to aid users in reproduction of the character from phonetic
pronunciation. That means there is a weak correlation -
sometimes none at all - between sound and writing.
Therefore, users must commit to memory thousands of characters
and associate a sound to it aided by weak indicator that
might give clues as to how it is pronounced. People
generally employ two different methods when they are reading.
There direct access where they automatically recognize a word
and its associated meaning based on its appearance. Even
for a phonetic system like English, the direct access method has
been identified and proven. Chinese in this sense is much
the same way; however, when an English reading encounters a word
she has not seen before, she usually relies on sounding out the
word. This is the phonetic approach which is unavailable
to Chinese readers. There are phonetic clues in
characters, but there is no system or method to the madness.
It is a much costlier process just to be literate for
character-based systems.
Punctuation: Chinese, believe it or not, did not have punctuation
to begin with. The concept of punctuation did not come
about until modern
times when it was borrowed from Western scripts.
Punctuation is such a powerful orthographic tool because it
allows readers and writers to express context outside of words.
Without punctuation, readers are left to decipher the meaning of
the writer. The difference can be profound. For example
take the sentence "Let's eat, Grandma." When you take out
the punctuation, you get "lets eat grandma." Obviously,
this sentence without punctuation can be taking in a vastly
different way from the first. A reader without the aid of
punctuation suffers from the dearth of information contained in
the words alone. Obviously, the costs of deriving the
meaning of individual sentences from context greatly outweighs
the extra mental effort of designing and learning a system of
punctuation. Unfortunately, since the introduction
of borrowed punctuation systems from the West, Chinese punctuation
in common practice still remains largely ad hoc. Some
norms have arisen, but problems with standard usage are still
extant.
Capitalization:
There is no way to draw a fair comparison with alphabetic
scripts in this category. Capitalization does not exist in Chinese.
It's is an element better suited for alphabetic scripts. I
can't imagine developing one for Chinese. We might just be
better off underlining all our proper nouns and putting in
periods at the stop of a sentence.
Word Division:
Ah, words indeed! The concept of "word" seems to be one that the Chinese people never
truly fleshed out when developing their orthography. When
you write Chinese there is no word division. In fact, the concept of 'word' itself is frighteningly
weak. It may be difficult for alphabetic script users to
understand
this, so here comes a detailed exploration. Words are made
of morphemes (the smallest unit of meaning). Take a word
like "teacher". There is the morpheme "teach" (to educate)
and the morpheme "er" ('one who does'). Morphemes come in
two flavors in English: bound and unbound. "Teach" is
considered an unbound
morpheme since it can function by itself. "Er" is a bound
morpheme because it cannot exist alone. It must be appended to
another morpheme to have meaning. The parallel I would like to draw for
is this: Individual Chinese characters are
morphemes like "teach" and "er." The problem with Chinese
orthography is that it never occurred to the Chinese that they
should view "teach" and "er" not as separate units, but one
functioning
word. There is an enormous propensity for Chinese
character users to fixate on the morphemes and completely loose
sight of the word that these units of meaning are trying to
convey. Just look at Chinese dictionaries for example, it is still possible to find
individual entries for 'bounded' morphemes that provide a
definition as if the morpheme could function
alone. Has anyone ever seen a dictionary entry for "er" in English? This flaw
is so fundamental that is has great implications for Chinese
data processing as I will explain later, and it is the root
cause of many misunderstanding of the writing system in general.
In summary, Chinese writing attempts to do with its characters
something those users of alphabetic scripts have devised
multiple systems to accomplish. The characters therefore
are overburdened with the need to contextualize ideas on top of
expressing meaning. The problems concerning character
script do not end here. Let's look at how good a job
character script does on what it's meant to do: convey the
meaning of words.
Phonetics:
The alphabet gives clues to the way the word is read in almost a
1:1 mapping from letter to sound (Okay, so this is not entirely
true due to the plethora of spelling variations that comes from
borrowings from other languages like French and Spanish, but the
entire system as a whole still is regular and predictable.)
Character scripts strike too low on the phonetic scale because
only 85% of the characters provide some sort of phonetic clue*.
On top of that, even fewer give a precise phonetic
pronunciation. Some give the right sound, but the wrong tone.
Some give the right tone but a slightly off sound. Some give
neither the right tone nor the right sound. It's an absolutely
nightmarish guessing game.
Semantics:
Chinese characters strike way to high on the semantic level in a
way that is unnecessary, and it totally misses on
compartmentalizing distinct units of thought like words,
sentences, and paragraphs. Since each character is so
pregnant with meaning, there is a tendency to become over
obsessed with the morphemes and not the words themselves.
This truly begat problems for development of Chinese grammar and
computing system.
So after all that,
I would like to turn this discussion back to answer a question I posed in the
beginning. Why does Chinese not have spell-check?
There are a number of technical reasons that impede the
development of Chinese word processing. First, there is no
manageable unit of logical, syntactic, or lexical analysis. In
alphabetic scripts, this unit of analysis is the word. In
Chinese it can be no different. It has to be the word.
This is already being accepted de facto by the developers of Chinese
word processing. The idea of agreed upon, set standards for
words will be the first step to developing a Chinese
spell-check. This step may be harder than it seems. Due to the
fixation on morphemes, "words" in Chinese as they stand remain
largely ad hoc in their creation and usage. Since people
usually cobble together the appropriate morphemes to describe
something (it gets worse the more technical the term), there may
be many possible ways to say the same thing. There needs to be
some sort of standardization down to a set of agreed upon
"words" for describing things.
Following the establishment of
words, there needs to be a way of spatially identifying words.
In English, we can say "a word is something that appears between
two spaces," and we would be right as far as being able to
identify where a word starts and ends. In Chinese, since there
is not word division, word identification must be pulled from
context. There is no simple rule that a computer can follow
like in English to pull out the word for analysis. In short, we
have some concept of "word" in Chinese, but our computers are
too simple to identify them. There are two ways to get around
this issue. First, we can program our computers so that
they can perform the same functions as a highly trained
newspaper editor. Or, we can start writing Chinese with
standardized word division. Yes, READ: spaces in between
Chinese words. Is this a far fetched idea? Why not
writing Chinese with spaces? Instead of trying to find a way to
program our computers to lift words from every single sort of
context, why not write Chinese with
spaces? Why not write: "women jintian yao qv naer?" (Where are
we going today?) in characters with spaces instead of the "womenjintianyaoqvnaer?"
that currently is the norm?
Perhaps what I am proposing just too "out there." Maybe
there is a better way, but I certainly hope that after all this
that my readers can at least appreciate the idea.
|