[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: intermediate summary (Re: filename encoding)



On Fri, 2 Feb 2001, Tomohiro KUBOTA wrote:

>    encoding for      parameter for    parameter for   encoding for
>    physical media[1] open(kernel)[4]  fopen(libc)[5]  the end user[6][7]
> ---------------------------------------------------------------------
> 1. own encodings[2]  UTF-8            UTF-8           locale
> 2. own encodings     locale           locale          locale
> 3. locale[3]         locale           locale          locale
> 4. mixture of 2 and 3

I think 4. is the most compatible with the current world.

This is unfortunate for Haskell, probably Java, and other languages which
use Unicode wide characters internally. Because when names are physically
stored in UTF-8 (a sample ext2 installation in the future) or UCS-2
(VFAT), but the locale is e.g. ISO-8859-x and thus the filesystem is
mounted with conversion to ISO-8859-x, handling filenames in these
encodings loses data because of the bad intermediate form.

Such installation can be cured by using UTF-8 as the locale, but I'm not
sure if it's a working choice in the near future, as most program don't
work in UTF-8 well and I need to use ISO-8859-2 in mail, news etc. with
mailers and newsreaders which don't convert anything yet.

An alternative design would allow 1. as an additional option, preferable
using wchar_t instead of UTF-8. It would be a benefit for Haskell, and for
C programs which use wide characters internally. It would shift the burden
of filename conversion from Haskell's libraries to libc, kernel and system
configuration. It would not lose data except when unavoidable
(a filesystem using an 8-bit encoding physically). But 4. must be the
default I think.

>   Conversion between 'parameter for open()' and 'parameter for fopen()'
>   is responsibility of libc.  However, I think it is a bad idea that
>   open() and fopen() take different encoding.

I agree.

>   Conversion between 'parameter for fopen()' and 'encodings for
>   the end user' is responsibility of individual application softwares.

This includes Haskell libraries shipped with Haskell implementations.
Actually it's open() in at least one implementation.

>   End users have to use locale encoding.  This is a must.

In C yes. In Haskell no, as in Java.

> The 1st idea:
> This is 'individual softwares do conversion' idea.

IMHO this is unacceptable: we can't require *every* program working with
filenames to implement conversions. They will just not do it and break on
non-ASCII filenames.

> The 2nd idea:
> This is 'kernel is responsible for all conversions' idea.
> I like this idea the best.  The problem is I don't know whether
> this is technically possible or not.  The problem is, kernel has
> to know LC_CTYPE locale.

I think that we must live with the fact that kernel-side encodings are
specified and implemented very differently from libc encodings. There are
modules for particular encodings and mount options telling which encoding
to use.

> The 3rd idea:
> This is the current situation.  The problem of this idea is
> (1) some encodings may include '/' code.

Encodings which use '/' cannot be used for filenames on matter what,
period. Because software, not only kernel, use '/' when working with
locale-encoded filenames.

> (2) users may want to use several locales at a time.

I'm afraid it cannot be handled well in the current world, unless all
locales used on a given system have the same LC_CTYPE.

> (4) how about removable media?

fstab should be configured to provide filenames in the encoding chosen for
the given system configuration.

> (4) specify encoding when mount.  (I think this is broken idea
>     because this idea is against the mother idea itself.  I think
>     encodings should always be determined by LC_CTYPE, if this 3rd
>     idea would be taken.)

IMHO it cannot be done differently. Kernel does the encoding and it does
not known LC_CTYPE of processes.

-- 
Marcin 'Qrczak' Kowalczyk

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/