[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode: endpoint of evolution of encodings?
hi srintuar
Since this is using "smart" font technology the underlying data
characters don't change - though one script (eg. Cyrillic) is
transliterated into another (Latin). All the rules for doing this
are built into the font. The lookups would need have to be keyed
to specific languages, since such transliteration rules would
undoubtedly be different from one language to another.
This "transliteration" happens only when the text is displayed leaving
the data characters don't change so it causes no loss of information.
This feature is only in AAT /ATSUI font format spec, though you could do
the same with Graphite, since Graphite allows you to define your own
features.Something similar could probably be done with OpenType though
it would involve changes to the shaping engine (Uniscribe, Pango, etc)
as well as adding tables to the font. You'd also need to get the feature
registered.
srintuar wrote:
Christopher Fynn wrote:
The Transliteration feature types allows text is one format to be
displayed using another format. An example is taking a hiragana string
and displaying it as katakana. This is an exclusive feature type.
Currently defined selectors for this feature are:
o Hiragana to Katakana
o Katakana to Hiragana
o Kana to Romanization
o Romanization to Hiragana
o Romanization to Katakana
There is no one "right" way to perform these projections.
Also, they are not necessarily reflexive. (meaning they
lose information- you couldnt recover the original text
from the transformed text in some cases)
You'd have to encode all the rules for SerbianCyrillic to
SerbianLatin transliteration into the font. This only
results in glyph (display) transformations from one script
to another, leaving the underlying data characters remain
untouched so there is no information loss.
There is no way you could encode such information into a
font face itself by displaying alternate glyphs. Also, you
would not be able to unify Hiragana and Ro-maji pairs into
single codepoints. (ro-maji are context sensitive, for one
thing)
The transliteration rules for these transformations can be
context sensitive (like any other AAT / OpenType / Graphite
shaping or positioning feature). Other contextual shaping
and positioning features could be used in conjunction with
If you want more information look at:
<http://developer.apple.com/fonts/Registry/index.html>
<http://developer.apple.com/fonts/TTRefMan/RM06/Chap6feat.html>
<http://developer.apple.com/fonts/TTRefMan/RM06/Chap6mort.html>
http://developer.apple.com/fonts/TTRefMan/RM06/Chap6Tables.html
ICU also has a Transliterator class
<http://oss.software.ibm.com/icu/apiref/classTransliterator.html>
<http://oss.software.ibm.com/icu4j/doc/com/ibm/icu/text/Transliterator.html>
regards
- chris
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/