[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
intermediate summary (Re: filename encoding)
Hi,
Many messages have been posted.
I summarized ideas for futher discussion.
There are a few possibilities:
---------------------------------------------------------------------
encoding for parameter for parameter for encoding for
physical media[1] open(kernel)[4] fopen(libc)[5] the end user[6][7]
---------------------------------------------------------------------
1. own encodings[2] UTF-8 UTF-8 locale
2. own encodings locale locale locale
3. locale[3] locale locale locale
4. mixture of 2 and 3
---------------------------------------------------------------------
Note [1]:
Generally speaking, filesystems have their own encodings.
ISO-2022 for ISO-9660 CD-ROM, UCS-2 for VFAT, and so on.
However, ext2 and so on don't have specified encodings (yet).
Note [2]:
This would be UTF-8 for ext2 and so on.
Note [3]:
Locale encodings only for ext2 and so on. Of course ISO-9660,
VFAT, and so on have their own encodings. Note that 'locale'
means LC_CTYPE locale.
Note [4]:
Conversion between 'encodings for physical media' and 'parameter
for open()' is responsibility of kernel (filesystem).
Note [5]:
Conversion between 'parameter for open()' and 'parameter for fopen()'
is responsibility of libc. However, I think it is a bad idea that
open() and fopen() take different encoding.
Note [6]:
Conversion between 'parameter for fopen()' and 'encodings for
the end user' is responsibility of individual application softwares.
Note [7]:
End users have to use locale encoding. This is a must.
The 1st idea:
This is 'individual softwares do conversion' idea.
[Edmund GRIMLEY EVANS <edmundo@xxxxxxxx>]
> I suggest it should be performed in individual programs, if at all
> (I'm not sure it's worth implementing).
To follow the 1st idea, conversion is needed twice. However, the
kernel's conversion (physical media's encodings <--> UTF-8) is
'fixed' conversion, i.e., kernel doesn't need to know locale.
However, 'twice' is not the largest problem of this idea.
[Bram Moolenaar <Bram@xxxxxxxxxxxxx>]
> It's also a lot of work and hassle to incorporate the knowledge about file
> name conversion in every program that handles file names.
This would be the largest problem. If a software doesn't support
conversion, users who use non-UTF-8 locales would suffer.
The 2nd idea:
This is 'kernel is responsible for all conversions' idea.
I like this idea the best. The problem is I don't know whether
this is technically possible or not. The problem is, kernel has
to know LC_CTYPE locale.
The 3rd idea:
This is the current situation. The problem of this idea is
(1) some encodings may include '/' code. (2) users may want to
use several locales at a time. (3) files not in /home directory
can't have locale. (4) how about removable media?
Suggested solutions:
(1) just give up.
(2) no solutions yet.
(3) no solutions yet. However, I imagine non-ASCII encodings would
be prohibited.
(4) specify encoding when mount. (I think this is broken idea
because this idea is against the mother idea itself. I think
encodings should always be determined by LC_CTYPE, if this 3rd
idea would be taken.)
The 4th idea:
Ext2 and so on will have a flag whether the encoding is UTF-8
or locale.
---
Tomohiro KUBOTA <kubota@xxxxxxxxxx>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/