[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux and UTF8 filenames



Martin Kochanski, lunes, 16 sep, 2002; 12:53:16 +0100:
> 
> Linux, to me, is more of a puzzle. The kernel simply treats filenames as a sequence of bytes, so it will happily accept almost anything you throw at it. In particular, 52 EA 76 65 and 52 C3 AA 76 65 are both valid filenames. What I can't immediately work out is what the tools (such as 'ls') will do. Is it universally the case that the tools will assume that those byte-sequence filenames are in UTF8 (in which case the two examples come out as R?ve and Rêve)? Or do they assume a standard locale (perhaps yielding Rêve and Rêve)? Or is this a switchable option that the user can set? In any case, how can a poor innocent server discover enough about the context in which it is running to know what filename it has to use so that a user who lists a file directory will see "Rêve" on his screen?
> 

It's determined by the users locale setting, i.e. the value of the
environment variables LC_ALL, LC_CTYPE or LANG, in that order. So if the
user has set LANG to fr_FR@euro, and has no setting for LC_ALL or
LC_CTYPE, "Rêve" would be displayed correctly if it's encoded in
ISO-8859-15 (or -1 since it's almost the same). Now if the above user
sets LC_CTYPE to fr_FR.UTF-8, the file name ought to be in UTF-8 or it
will not be displayed correctly. If none of the variables are set, the
setting of C is assumed, which in practice means US-ascii.

Jari
-- 
BOFH excuse #85:

Windows 95 undocumented "feature"

Attachment: pgp00015.pgp
Description: PGP signature