[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: determining locale's character set
"Alexander Voropay" wrote on 2000-01-19 11:56 UTC:
> And one more difficult question. Nor POSIX nor XPG does not have
> support for Charset Aliases as defined by IANA/IETF :
> http://www.isi.edu/in-notes/iana/assignments/character-sets
> So, you can get any Alias from nl_langinfo(CODESET), not
> right (MIME preferred) name of your current Charset.
The names are not standardized, but some simple regular expressions or
substrings will capture the most commonly used character sets pretty
well:
- UTF-8 seems to be exclusively represented as "UTF-8", so no problem
here.
- ISO 8859-15 etc. can have any of the forms "(ISO|iso)( |-|_)?8859-15",
which can all also be recognized reliably.
It is not too difficult to come up with generous regular expressions for
the various CJK standards, etc. Make sure you cover all notations used
in
ftp://dkuug.dk/i18n/charmaps/
http://www.isi.edu/in-notes/iana/assignments/character-sets
http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/codeset_over.htm#A163C116e3
ftp://sunsite.doc.ic.ac.uk/packages/X11/pub/R6.4/xc/registry
when you parse the nl_langinfo(CODESET) output, before you convert it
into a MIME charset value. This way, you should capture quite easily
every conceivable name in a portable way.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/