[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: filename encoding (was: ISO-2022)
Andries Brouwer wrote:
> > No. A filename is just a sequence of bytes - no conversion required
> > or desirable.
>
> From the point of view of the kernel it's just a sequence of bytes ...
> From the point of view of the user the bytes form characters with a
> specific meaning. If you use the wrong character set, that meaning
> is lost.
>
> True. But the user uses the right locale, so there is no problem.
> At least no problem that we can do anything meaningful about.
What is "the right locale" if I have file names in several different
encodings? I don't think there is one. A user could do:
ls /cdrom/dir /usr/something/dir
To list files from two different filesystems, possibly in different encodings.
This is a very normal way to check if two directories contain the same files
(e.g., on a CD-ROM and a harddisk).
> Conversion is required to keep the meaning.
>
> No. Conversion is impossible.
> Filenames are not only for the user, they are also parsed by shells
> and operating systems. You will be unable to convert filenames
> and not introduce bugs much worse than the legibility problem.
If Unicode is used as the common, intermediate encoding, I'm sure conversion
will help to make it all work. The only condition is that the encoding is
known, otherwise conversion is impossible.
> No, because one single ext2 filesystem has both the files of this Dane
> and of these Russians. All are happy today, but as soon as you write
> somewhere that it contains filenames in KOI-8, the Dane will be very unhappy.
If the filesystem contains a mix of encodings, it should be marked as such.
In that case conversion shouldn't be used, since we wouldn't know what to
convert from or to.
If a new filesystem is setup, where new users get their home directory, you
could make sure that only UTF-8 encoding is used. Marking the filesystem as
UTF-8-only would be a good idea. Then existing users could be moved to that
new filesystem one-by-one, converting the file names as needed (probably has
to be done with some very intelligent script or manually). In the end you
have a UTF-8-only system.
The problem here is the conversion process. Many system administrators will
postpone this as long as they possibly can. The flag that indicates a
filesystem is UTF-8 only will at least help the conversion process a bit and
make it possible to spread it over time.
For networked filesystems and removable media the UTF-8-only flag is
essential, since you can't convert the whole project/company/world at the same
time. Don't forget all the tapes with backups!
> If we are going to introduce UTF-8 for file names (which is mostly a good
> idea), there will be a conflict with ISO-8859 names currently used
> (especially in Europe). If this problem isn't solved properly,
> users will not convert to using UTF-8. That's why this problem
> needs to be tackled and discussed in this list.
>
> I maintain: (1) There is no problem. Or (2) In case you think that
> there is, it cannot be solved. And (3) filenames are the least of
> your worries. Yes I see filenames in interesting character sets.
> (Example: old DOS distributions sometimes have a sequence of filenames
> with mostly line drawing characters, so that a DIR command in that
> directory will show you some boxed text. But only in CP437 or so.)
> But the only really interesting part is the file contents.
> As soon as people use UTF-8 for that, the filenames will follow.
Not automatically. Don't ignore the problems users will run into, otherwise
the users will ignore you and we'll never get people to use UTF-8.
The problem with filenames must be solved before introducing yet another
encoding for them, even when it's a good one like UTF-8.
--
hundred-and-one symptoms of being an internet addict:
127. You bring your laptop and cellular phone to church.
/// Bram Moolenaar -- Bram@xxxxxxxxxxxxx -- http://www.moolenaar.net \\\
((( Creator of Vim - http://www.vim.org -- ftp://ftp.vim.org/pub/vim )))
\\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/