[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strcoll for utf-8
Paul Michel wrote on 2002-01-09 14:37 UTC:
> But strtok() for instance does not handle utf-8
> data properly. Is this also in the standards? Reading
> at the two urls below, I could not see where it was
> explained that strcoll() does and strtok() does not...
>
>
> >See
> http://mail.nl.linux.org/linux-utf8/2001-12/msg00042.html
> and
> >http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
Well, just read the standard, which unambiguously contains all required
information and is freely available online:
http://www.opengroup.org/onlinepubs/007908799/xsh/strtok.html
"A sequence of calls to strtok() breaks the string pointed to by s1 into
a sequence of tokens, each of which is delimited by a byte from the
string pointed to by s2." ^^^^
The meaning of the terms byte and character should be obvious, even in
UTF-8. The only documentation that doesn't make this distinction very
clear yet is the glibc manual, so feel free to volunteer and fix that
one as well.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/