[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Some Reqirement of Chinese Language support
On Tue, 30 May 2000, Bruno Haible wrote:
> PILCH Hartmut writes about conversion from zh_tw to zh_cn:
> > Conversion from classical to simplified Han characters is a very simple
> > matter, except for a handful of exceptions where the result is slightly
> > less than perfect at an aesthetical level, but still easy to understand.
> >
> > The fact that software is translated as "ruan3jian4" in mainland China and
> > "ruan3ti3" in Taiwan or other surface-level divergences of vocabulary also
> > don't have anything to do with this subject. The fact is that there is
> > one single Greater Chinese book market and neither Taiwanese nor mainland
> > Chinese authors have to change a single character in any of their books in
> > order to be accepted by the other side. The only change that is usually
> > performed is the classical-simplified transformation. This is a hard fact
> > which should be given more attention than the expertise of those who claim
> > that it can't be done.
Books aren't the best analogy, as most are left as-is and readers do the
conversion in their heads. But books which are published in
another format are done in Halpern's [1] level 2, although some sloppy
work (for something published) is done in level 1.
It sounds like Halpern's level 1 is being proposed by Pilch, which is a
simple conversion which leaves generally tolerable "spelling" mistakes;
level 2 would be more preferable, but would require dictionary-like data.
(There already exists Windows software that does level 1 or 2.)
Levels 3 and 4 refer to content (such as word choice) and are beyond the
scope of character set conversion, script conversion/transliteration,
spelling conversion, etc.
[1] http://www.basistech.com/articles/C2C.html
> > If anyone is interested in putting the conversion feature into Mozilla or
> > some other browser, I'll be glad to help implement it, and to correctly
> > code some Chinese texts so as to prove that it works.
>
> An LGPLed implementation of this would be highly welcome. Many GNU
> packages have localized messages for zh_tw, and automatic conversion
> to zh_cn would nicely fit into the automatic charset conversion done
> by GNU gettext.
This sounds analogous to a hypothetical en_US->en_GB filter[2] that
converts ize->ize, er->re, and or->our (level 1), but not *elevator->lift,
*truck->lorry, or *hood->bonnet (level 3/4), and does this conversion from
EBCDIC->ASCII[3] as well. Is this within the scope of gettext?
[2] Not neccessarily round-trippable, but the damage is generally
tolerable.
[3] Not a perfect analogy; the two character sets should be mostly
non-intersecting sets.
There is already such a converter being used between the Debian Linux
mailing lists "debian-chinese" and "debian-simplified-chinese", which
mirrors messages between the two. I think it works on the level
of Halpern's level 1.
Thomas Chan
tc31@xxxxxxxxxxx
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/