[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: determining locale's character set



Bruno Haible wrote on 2000-01-17 21:44 UTC:
> Maybe the local_charsets could, if not set, default to the locale's
> character set (nl_langinfo(CODESET) or equivalent).

How well are the possible values of nl_langinfo(CODESET) standardised,
and how widely is <langinfo.h> available?

The information on

  http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html
  http://www.opengroup.org/onlinepubs/7908799/xsh/nl_langinfo.html

has not yet convinced me that this is a particularly neat or
well-defined interface. On

  http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/codeset_over.htm#A163C116e3

there is a long list of nl_langinfo(CODESET) values supported under AIX.

I guess, finding out whether a locale is UTF-8 based would then be done
just via

  if (strcmp(nl_langinfo(CODESET), "UTF-8") == 0) {
    /* let's do it in UTF-8 */
    ...
  }

For more portable applications and those who want to bypass setlocale()
entirely, I would suggest as a simpler alternative something like

  if ((s = getenv("LC_ALL")) ||
      (s = getenv("LC_CTYPE")) ||
      (s = getenv("LANG"))) {
    if (strstr(s, "UTF-8") || strstr(s, "utf-8")) {
      /* let's do it in UTF-8 */
      ...
    }
  }

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/