[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Forcing vim 6.0 to stay in UTF-8 mode in a UTF-8 locale



Followup to:  <E17gq5m-00087Z-00@xxxxxxxxxxxxxxxxxxxx>
By author:    Markus Kuhn <Markus.Kuhn@xxxxxxxxxxxx>
In newsgroup: linux.utf8
>
> I just noticed that when I work in a UTF-8 locale (LC_CTYPE=en_GB.UTF-8),
> that vim 6.0 normally opens a UTF-8 file such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/lyrics-ipa.txt
> 
> properly in UTF-8 mode, but it deactivates UTF-8 mode when you load
> instead a file that contains malformed sequences, such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
> 

It needs to do something sensible to encode malformed sequences, so
you can do lossless binary editing.

One way is to treat each byte of a malformed sequence as a character
(different from all real Unicode characters).  This is a mostly good
approach, except that it allows the user to construct a valid UTF-8
character out of malformed sequence escapes -- this may or may not be
a problem in any particular application, but it needs to take into
account, lest we get another instance of the overlong sequence
problem.

	-hpa
-- 
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@xxxxxxxxx>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/