[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some Reqirement of Chinese Language support



On Tue, 30 May 2000, Bruno Haible wrote:

> PILCH Hartmut writes about conversion from zh_tw to zh_cn:
> > Conversion from classical to simplified Han characters is a very simple
> > matter, except for a handful of exceptions where the result is slightly
> > less than perfect at an aesthetical level, but still easy to understand.
> > 
> > The fact that software is translated as "ruan3jian4" in mainland China and
> > "ruan3ti3" in Taiwan or other surface-level divergences of vocabulary also
> > don't have anything to do with this subject.  The fact is that there is
> > one single Greater Chinese book market and neither Taiwanese nor mainland
> > Chinese authors have to change a single character in any of their books in
> > order to be accepted by the other side.  The only change that is usually
> > performed is the classical-simplified transformation.  This is a hard fact
> > which should be given more attention than the expertise of those who claim
> > that it can't be done.

Books aren't the best analogy, as most are left as-is and readers do the
conversion in their heads.  But books which are published in
another format are done in Halpern's [1] level 2, although some sloppy
work (for something published) is done in level 1.

It sounds like Halpern's level 1 is being proposed by Pilch, which is a
simple conversion which leaves generally tolerable "spelling" mistakes;
level 2 would be more preferable, but would require dictionary-like data.
(There already exists Windows software that does level 1 or 2.)

Levels 3 and 4 refer to content (such as word choice) and are beyond the
scope of character set conversion, script conversion/transliteration,
spelling conversion, etc.

[1] http://www.basistech.com/articles/C2C.html


> > If anyone is interested in putting the conversion feature into Mozilla or
> > some other browser, I'll be glad to help implement it, and to correctly
> > code some Chinese texts so as to prove that it works.
> 
> An LGPLed implementation of this would be highly welcome. Many GNU
> packages have localized messages for zh_tw, and automatic conversion
> to zh_cn would nicely fit into the automatic charset conversion done
> by GNU gettext.

This sounds analogous to a hypothetical en_US->en_GB filter[2] that
converts ize->ize, er->re, and or->our (level 1), but not *elevator->lift,
*truck->lorry, or *hood->bonnet (level 3/4), and does this conversion from
EBCDIC->ASCII[3] as well.  Is this within the scope of gettext?

[2] Not neccessarily round-trippable, but the damage is generally
tolerable.

[3] Not a perfect analogy; the two character sets should be mostly
non-intersecting sets.


There is already such a converter being used between the Debian Linux
mailing lists "debian-chinese" and "debian-simplified-chinese", which
mirrors messages between the two.  I think it works on the level
of Halpern's level 1.


Thomas Chan
tc31@xxxxxxxxxxx


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/