[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-ASCII characters in file names
On Sun, Mar 18, 2007 at 08:41:48AM -0700, Ben Wiley Sittler wrote:
> awesome, and thank you! however, utf-8 filenames given on the command
> line still do not work... the get turned into iso-8859-1, which is
> then utf-8 encoded before saving (?!)
>
> here's my (partial) utf-8 workaround for emacs so far:
>
> (if (string-match "XEmacs\\|Lucid" emacs-version)
> nil
> (condition-case nil (eval
> (if
> (string-match "\\.\\(UTF\\|utf\\)-?8$"
> (or (getenv "LC_CTYPE")
> (or (getenv "LC_ALL")
> (or (getenv "LANG")
> "C"))))
> '(concat (set-terminal-coding-system 'utf-8)
> (set-keyboard-coding-system 'utf-8)
> (set-default-coding-systems 'utf-8)
> (setq file-name-coding-system 'utf-8)
> (set-language-environment "UTF-8"))))
> ((error "Language environment not defined: \"UTF-8\"") nil)))
Here are all my relevant emacs settings. They work in at least
emacs-21 and later; however, emacs-21 seems to be having trouble with
UTF-8 on the command line and I don’t know any way around that.
; Force unix and utf-8
(setq inhibit-eol-conversion t)
(prefer-coding-system 'utf-8)
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(setq file-name-coding-system 'utf-8)
(setq coding-system-for-read 'utf-8)
(setq coding-system-for-write 'utf-8)
Note that the last two may be undesirable; they force ALL files to be
treated as UTF-8, skipping any detection. This allows me to edit files
which may have invalid sequences in them (like Kuhn’s decoder test
file) or which are a mix of binary data and UTF-8.
I use the experimental unicode-2 branch of GNU emacs, and with it,
forcing UTF-8 does not corrupt non-UTF-8 files. The invalid sequences
are simply shown as octal byte codes and saved back to the file as
they were in the source. I cannot confirm that this will not corrupt
files on earlier versions of GNU emacs, however, and XEmacs ALWAYS
corrupts files visited as UTF-8 (it converts any unicode character for
which it does not have a corresponding emacs-mule character into a
replacement character) so it’s entirely unsuitable for use with UTF-8
until that’s fixed (still broken in latest cvs as of a few months
ago..).
BTW looking for “UTF-8” in the locale string is a bad idea since UTF-8
is not necessarily a “special” encoding but may be the “native”
encoding for the selected language. nl_langinfo(CODESET) is the only
reliable determination and I doubt emacs provides any direct way of
accessing it. :(
~Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/