[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to detect the encoding of a string?
Roger Leigh wrote on 2005-06-02 19:29 UTC:
> Forcing the Linux system call interface to require valid UTF-8 would
> be a fantastic extension to POSIX. (Generic, not per-filesystem.)
This was already discussed several times in both the Linux kernel and the
POSIX communities. Each time, there was a pretty overwhelming consensus
*against* this idea. This is clearly not going to happen. There is hardly
any advantage gained from making the kernel less binary-transparent than
it already is. This proposal only introduces a lot of checking overhead
and a whole load of new error conditions that nobody wants.
By the way, a carefully tested portable routine for checking whether a
\0-terminated string is correct UTF-8 is
http://www.cl.cam.ac.uk/~mgk25/ucs/utf8_check.c
Markus
--
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/