[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Forcing vim 6.0 to stay in UTF-8 mode in a UTF-8 locale
On Mon, Aug 19, 2002 at 12:54:24PM -0700, H. Peter Anvin wrote:
> One way is to treat each byte of a malformed sequence as a character
> (different from all real Unicode characters). This is a mostly good
> approach, except that it allows the user to construct a valid UTF-8
> character out of malformed sequence escapes -- this may or may not be
> a problem in any particular application, but it needs to take into
> account, lest we get another instance of the overlong sequence
> problem.
That's what Vim does. Malformed sequences show up as <HEX>, which
functions as a single character.
If the editor is 8-bit-clean, and you combine bytes that were parts of
invalid UTF-8 sequences such that you have a valid UTF-8 sequence, you
have a UTF-8 sequence; if I combine 0xC2 with 0xA9, it'd better write
those two bytes to disk, even though it happens to correspond to U+00A9;
doing anything else isn't 8-bit-clean.
I tested this, and that's exactly what happens; pasitng <A9> in front of
<C2> turns the pair into (C).
What could be done differently?
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/