[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Expat XML Parser Full Character Encoding Support
Michael B. Allen writes:
> So the first column
> is a big endian representation of the multibyte sequence corresponding
> to the UCS code in the right column? So I could generate the maps from
> that information and use the libiconv *_mbtowc functions to do multibyte
> conversions.
Yes.
> Incidentally why is there no ISO-2022-JP.TXT?
ISO-2022-JP can not be described by such a table. It's a stateful
encoding.
Even with an expat that understands other encodings than UTF-8 and
ISO-8859-1, people should continue using UTF-8 for their XML files.
Quoting from http://www.w3.org/TR/charmod/ :
"When specifications choose to allow encodings other than Unicode
encodings, implementers should be aware that the correspondence
between the characters of a legacy encoding and Unicode characters
may in practice depend on the software used for transcoding. See the
Japanese XML Profile [http://www.w3.org/TR/japanese-xml/] for
examples of such inconsistencies."
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/