[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strcoll for utf-8



Paul Michel writes:
> 
> IMHO, strcoll cannot correctly handle utf-8 encoded
> characters since collation need explicit knowledge of
> characters.

But strcoll *has* explicit knowledge of characters. If you set LC_ALL
to fi_FI.UTF-8 then strcoll will know about the Finnish collation
rules and also know that strings are UTF-8 encoded. This is mandated
by the standards, and glibc 2.2 implements them.

See http://mail.nl.linux.org/linux-utf8/2001-12/msg00042.html
and http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html

Bruno
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/