[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

intermediate summary (Re: filename encoding)



Hi,

Many messages have been posted.
I summarized ideas for futher discussion.


There are a few possibilities:


---------------------------------------------------------------------
   encoding for      parameter for    parameter for   encoding for
   physical media[1] open(kernel)[4]  fopen(libc)[5]  the end user[6][7]
---------------------------------------------------------------------
1. own encodings[2]  UTF-8            UTF-8           locale
2. own encodings     locale           locale          locale
3. locale[3]         locale           locale          locale
4. mixture of 2 and 3
---------------------------------------------------------------------

Note [1]:
  Generally speaking, filesystems have their own encodings.
  ISO-2022 for ISO-9660 CD-ROM, UCS-2 for VFAT, and so on.
  However, ext2 and so on don't have specified encodings (yet).

Note [2]:
  This would be UTF-8 for ext2 and so on.

Note [3]:
  Locale encodings only for ext2 and so on.  Of course ISO-9660,
  VFAT, and so on have their own encodings.  Note that 'locale'
  means LC_CTYPE locale.

Note [4]:
  Conversion between 'encodings for physical media' and 'parameter
  for open()' is responsibility of kernel (filesystem).

Note [5]:
  Conversion between 'parameter for open()' and 'parameter for fopen()'
  is responsibility of libc.  However, I think it is a bad idea that
  open() and fopen() take different encoding.

Note [6]:
  Conversion between 'parameter for fopen()' and 'encodings for
  the end user' is responsibility of individual application softwares.

Note [7]:
  End users have to use locale encoding.  This is a must.



The 1st idea:
This is 'individual softwares do conversion' idea.
   [Edmund GRIMLEY EVANS <edmundo@xxxxxxxx>]
   > I suggest it should be performed in individual programs, if at all
   > (I'm not sure it's worth implementing).
To follow the 1st idea, conversion is needed twice.  However, the 
kernel's conversion (physical media's encodings <--> UTF-8) is
'fixed' conversion, i.e., kernel doesn't need to know locale.
However, 'twice' is not the largest problem of this idea.
   [Bram Moolenaar <Bram@xxxxxxxxxxxxx>]
   > It's also a lot of work and hassle to incorporate the knowledge about file
   > name conversion in every program that handles file names.
This would be the largest problem.  If a software doesn't support
conversion, users who use non-UTF-8 locales would suffer.


The 2nd idea:
This is 'kernel is responsible for all conversions' idea.
I like this idea the best.  The problem is I don't know whether
this is technically possible or not.  The problem is, kernel has
to know LC_CTYPE locale.  


The 3rd idea:
This is the current situation.  The problem of this idea is
(1) some encodings may include '/' code.  (2) users may want to
use several locales at a time.  (3) files not in /home directory
can't have locale.  (4) how about removable media?
Suggested solutions:
(1) just give up.
(2) no solutions yet.
(3) no solutions yet.  However, I imagine non-ASCII encodings would
    be prohibited.
(4) specify encoding when mount.  (I think this is broken idea
    because this idea is against the mother idea itself.  I think
    encodings should always be determined by LC_CTYPE, if this 3rd
    idea would be taken.)


The 4th idea:
Ext2 and so on will have a flag whether the encoding is UTF-8
or locale.


---
Tomohiro KUBOTA <kubota@xxxxxxxxxx>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/