[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of bind_textdomain_codeset()



On Mon, Oct 02, 2000 at 15:32:20 +0200, Bruno Haible wrote:
> Byrial Jensen writes:
> 
> > @@ -160,6 +170,19 @@ int main(int argc, char **argv)
> >  #ifdef ENABLE_NLS
> >      bindtextdomain(PACKAGE, LOCALEDIR);
> >      textdomain(PACKAGE);
> > +#ifdef HAVE_BIND_TEXTDOMAIN_CODESET
> > +   /*
> > +    * GNU libc 2.2 will convert all translated messages from gettext()
> > +    * to what it thinks is the current output character set. The default
> > +    * depends on the LC_CTYPE locale, but we cannot permanently set this
> > +    * as it would affect all isXXXXX() calls all over the program --
> > +    * so we have to bind the default charset to the right value instead.
> > +    */
> > +    setlocale (LC_CTYPE, "");
> > +    bind_textdomain_codeset (PACKAGE, nl_langinfo(CODESET));
> > +    bind_textdomain_codeset ("libc", nl_langinfo(CODESET));
> > +    setlocale (LC_CTYPE, "C");
> > +#endif
> >  #endif
> 
> This will nearly work. But not completely, because glibc's gettext function
> needs the LC_CTYPE locale for the codeset _and_ for the transliteration.
> You are only setting the codeset.

It would make sense to me if the language of the text influences
the transliteration, but I don't understand why or how the LC_CTYPE
locale which determines receiving codeset, influences it.

But my test programs confirms that it does. Would anyone please
explain to me what happens in follwing example?

The Danish letter "å" is translitterated to "aa" when LC_CTYPE is
"C" and to "ae" when LC_CTYPE is "da_DK". As a Dane I would say
that "aa" always is the correct ASCII translitteration of "å" so I
really don't understand what is happening.

$ cat loktest4.c
#include <locale.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <libintl.h>

int main (int argc, char *argv[])
{
  if (argc != 2)
  {
    printf ("Usage: %s  CODESET\n", argv[0]);
    return 1;
  }

  setlocale (LC_CTYPE, "");
  printf ("LC_CTYPE locale is \"%s\"\n", setlocale (LC_CTYPE, NULL));

  bind_textdomain_codeset ("libc", argv[1]);
  printf ("Textdomain \"%s\" is set to codeset \"%s\"\n", "libc", argv[1]);

  printf ("strerror (ENOENT) = \"%s\"\n", strerror (ENOENT));
  printf ("\n");
  return 0;
}
$ gcc -Wall -o loktest4 -static loktest4.c
$ env -i ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "No such file or directory"

$ env -i LANG=da ./loktest4 iso-8859-1
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "iso-8859-1"
strerror (ENOENT) = "Ingen sådan fil eller filkatalog"

$ env -i LANG=da ./loktest4 ASCII
LC_CTYPE locale is "C"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saadan fil eller filkatalog"

$ env -i LANG=da LC_CTYPE=da_DK ./loktest4 ASCII
LC_CTYPE locale is "da_DK"
Textdomain "libc" is set to codeset "ASCII"
strerror (ENOENT) = "Ingen saedan fil eller filkatalog"

$


> Moreover, nl_langinfo is not completely portable. bind_textdomain_codeset
> will also be contained in the next standalone gettext package, thus
> HAVE_BIND_TEXTDOMAIN_CODESET will be true even on old platforms with gettext,
> and your code won't compile.
> 
> I would therefore favour the opposite approach: Simply use
> 
>       setlocale (LC_CTYPE, "");
> 
> and simulate the isXXXXX() calls with substitutes specific to the C locale.
> Take for example the files [1] and [2], specially optimized for the C locale.

Thanks for the advice. I will consider it, but I would be happier
with a solution which doesn't change code all over the program if
it can be done in a correct way.

-- 
Byrial
http://home.worldonline.dk/~byrial/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/