[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Expat XML Parser Full Character Encoding Support
On Tue, 21 Jan 2003 13:21:30 +0100 (CET)
Bruno Haible <haible@xxxxxxx> wrote:
> > Is there a way to determine how many bytes will be needed to
> > represent each character in a character set?
>
> Yes, just take a look at the conversion tables, e.g. in
> libiconv/tests/*.TXT.
Mmm. Yes, this appears to be precisely what I need. So the first column
is a big endian representation of the multibyte sequence corresponding
to the UCS code in the right column? So I could generate the maps from
that information and use the libiconv *_mbtowc functions to do multibyte
conversions.
>
> > Can I dynamically generate this information with Markus Kuhn's perl
> > tools or by some other means?
>
> If you want it to be slow, you can certainly use perl for that
> purpose.
Well I just meant to generate the maps once but it looks like your
tests/*.TXT maps will do the job.
Incedentally why is there no ISO-2022-JP.TXT?
Mike
--
A program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes the potential for it to be applied to tasks that are
conceptually similar and, more important, to tasks that have not
yet been conceived.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/