[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to detect the encoding of a string?
On Thu, Jun 02, 2005 at 08:29:24PM +0100, Roger Leigh wrote:
> Forcing the Linux system call interface to require valid UTF-8 would
> be a fantastic extension to POSIX. (Generic, not per-filesystem.)
I do agree, however, it'd introduce several nontrivial problems that'd still
needed to be solved somehow:
- zip extracting :-), web/ftp/rsync mirroring etc. could cause files not to
be extracted/mirrored.
- What to do with non-UTF8 files that already exists on the disk? Should
e2fsck, reiserfsck etc. also force filenames to be UTF-8? Could they still
be accessible somehow? Would readdir() return them?
- Is it then sane to limit filename length to a fixed number of bytes
instead of a fixed number of characters then?
- Would one of NFC and NFD be forced (à la MacOS)?
Perhaps there are other problems as well...
Once I also heard a fantastic idea from someone: this could (at least
temporarily till everyone switches to UTF-8) be a per-filesystem mount
option instead of a global kernel variable or hardcoded to the kernel. This
way when you mount a filesystem, you could specify whether or not UTF-8 is
forced there.
--
Egmont
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/