[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Transliteration for use in UTF-8 locales



Markus Kuhn wrote:

> Jean-Marc Desperrier wrote on 2000-10-10 15:48 UTC:
> > Will iconv automatically use that mecanism ?
>
> I haven't looked into iconv yet and don't know whether it can do
> transliterations. Iconv is not locale-dependent, so the use of
> transliteration would have to be specified somehow in the encoding
> selection strings supplied to it, and I don't know of any standard
> convention for that. We probably should start thinking about defining
> one one.

Thank you for this very kind explanation.
The rest made me feel like I should have reread the locale man page
first :-)
I realized that I had missed several messages about this topic on the
list
recently.

Whatever, I know someone who could be interested in working on the
ancient
greek transliteration tables, BUT ...

I realised that he would opppose to me it makes much more sense to
install a
font that includes ancient greek and that he has a list of that
(http://club.euronet.be/frederique.bouras/polices.htm).

When thinking to that, I realized it's very unlikely to have both a
system
that includes this transliteration tables and to know how to use them
properly, and not to have the adequate fonts availables.

Therefore I see two most likely situation for the use of this mecanism :
- you receive a text that hold some character that do not belong to your
usual
environment, you _do not bother_ to install the proper fonts for it, or
_you
are not able_ to interpret the characters when they are displayed in
their
native form. In either case, this is strong indication that these
characters
do not belong to a language you are used to working with. This means
that
LC_TYPE will _not_ be configured to select a proper transliteration
method.
In that case, it means that if transliteration is only applied when
LC_TYPE
has a very specific value, the usefulness of this system will be quite
limited. On the other hand, applying transliteration without knowing if
the
user wants it, is dangerous.

- You're trying to make a text available and somewhat usable for someone
else,
who does not have a fully unicodized environnement. You need to convert
your
text to an encoding that is less powerful than unicode, and you would
like to
use some transliteration in the convertion process.
This is a case where you use iconv, and this is also the case for which
this
transliteration mecanism is interesting for me.

Coming back to transliteration tables for japanese :
I found a table of the Hepburn system on the Waseda university site, but
it's
a gif picture, therefore not very useful. But it made me realise
something. In
some cases, we need a "several to several" convertion for the
transliteration
of japanese.
For exemple, the transliteration for the japanese letter "ち" is "chi"
in
hepburn.
But if it is followed by the small character "ょ" as in "ちょ", the
transliteration becomes "cho"
And for "ちよ" (notice that this time, the second character is the same
size
as the first), the transliteration is "chiyo", which is just the
concatenation
of the transliteration of ち and of よ.

Also you may want "ちょう" to be converted to "chou" or to "chō", or to
"chô" to stay within ISO-8859-1.

"chō" is the "official" hepburn transliteration, but it's quite rarely
used,
especially in a computer environment :-) (if have "ō" available, you
have
unicode and can display "ちょう").

Can the tables handle this ? (I fear not :-) )
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/