[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strcoll for utf-8
Paul Michel writes:
>
> IMHO, strcoll cannot correctly handle utf-8 encoded
> characters since collation need explicit knowledge of
> characters.
But strcoll *has* explicit knowledge of characters. If you set LC_ALL
to fi_FI.UTF-8 then strcoll will know about the Finnish collation
rules and also know that strings are UTF-8 encoded. This is mandated
by the standards, and glibc 2.2 implements them.
See http://mail.nl.linux.org/linux-utf8/2001-12/msg00042.html
and http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/