[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Global LC_CTYPE and file names
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
>It is today not common practice that people use multiple LC_CTYPE values
>on the same system. People in Germany tend to use Latin-1 everywhere on
>their system, people in Russia tend to use KOI8-R everywhere, etc.
It's not true. Average "weighted" Charset usage statistic for Russia is:
Windows-1251 40%
KOI8-R 35%
IBM-866 20%
X-MAC-CYRILLIC 4%
ISO_8859-5 1%
In WEB world Windows-1251is used in 80% of cases.
In e-mail and NNTP KOI8-R hits 90%. And 99% of file
names on MS-DOS FAT / VFAT / SMB (Samba)
are stored in IBM866.
This is a fact and nothing can be changed...
The same situation for another Cyrillic languages:
Ukrainian (UA) uses KOI8-U, Windows-1251, IBM-866,
for CZ, BY, e.t.c.
http://czyborra.com/charsets/cyrillic.html
And for Japan language : ISO_2022-JP, EUC, JIS, Shift-JIS
are used simultaneously.
http://turnbull.sk.tsukuba.ac.jp/Tools/I18N/LJ-I18N.html
http://www.vsuccess.com/japanesecomputing.html
> If people use multiple LC_CTYPE values today on a single system,
> they are likely to get bad results occasionally, i.e. unreadable
> filenames, etc.
On modern Linux kernels we can use different charset for
filenames in system calls like "open" and for real "storing"
filenames on disk :
$ mount -t vfat -o umask=002,noexec,gid=100,codepage=866,iocharset=koi8-r
/dev/hdb1 /mnt
For FreeBSD : /etc/fstab :
/dev/sd0s1 /dos/c msdos rw,-W=koi2dos,-L=ru_RU.KOI8-R 0 0
For Samba :
client code page = 866
character set = koi8-r
>We want to encourage users to use only one encoding, because this is
>simple, robust, and technically sound.
Do not repeat mistake done by RedHat ! If you work with
Russian locale, never use short locale name LANG="ru" .
You MUST use full long locale name, with Charset :
LANG="ru_RU.KOI8-R".
It is a rather often situation, when and I read my e-mail in
KOI8-R, store files with Cyrillic names on FAT floppy in
CP866 and have apache started with "ru_RU.CP1251" locale.
>It just means that the POSIX mechanism is more
>flexible then what will be necessary in the long term.
It does just that is necessary. ;-)
P.S. If you can read russian, see my page
"Locale AS IT IS"
http://www.sensi.org/~alec/locale
--
-=AV=-
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/