[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ISO 2022 and termcap ballast
Bram@moolenaar.net (Bram Moolenaar) wrote on 04.11.99 in <199911041035.LAA01216@moolenaar.net>:
> Markus Kuhn wrote:
>
> > Bram Moolenaar wrote on 1999-11-03 22:07 UTC:
> > > Read which standard? This can't be the only one. Why else would there
> > > be a termcap/terminfo database with so many entries?
> >
> > Almost all termcap/terminfo entries are useless today, because these
> > terminals do not exist any more. Almost everyone uses vt100, xterm,
> > linux, or a very closely related terminal definition, and all of these
> > are pretty much simple subsets of ISO 6429 and ISO 2022 with some
> > private extensions (usually, but unfortunately not always, using one of
> > the private extension sequences reserved by ISO 6429 and ISO 2022). The
> > Linux console is a particularly bad offender, where some people
> > introduced blindly private extension sequences with a syntax very far
> > off the ISO standards (most likely due to ignorance of the standard and
> > the ESC sequence syntax principles).
>
> My experience is that there are more violiations of standards than correct
> implementations. That's the real world. We have to deal with it. We can't
> use the execuse that they should have used the standard and that's their
> problem. We must help the people that sit behind a computer and try to make
> the best of it.
>
> Anyway, I have concluded that using ISO 2022 ESC sequences to recognize
> UTF-8 files (or files in any other encoding) isn't useful for Vim.
True.
The most common use today for ISO 2022 escape sequences is in far eastern
computing, that is, Chinese, Japanese, and Korean. See, for example, the
MIME encoding standards for these languages;
ISO-2022-KR RFC1557
ISO-2022-JP RFC1468
ISO-2022-JP-2 RFC1554
ISO-2022-CN, ISO-2022-CN-EXT RFC1922
This seems to be a pretty typical use: you have just a few candidate
charsets and use ISO 2022 to switch between them.
Of course, UTF-8 is a nicer solution to this problem.
> The discussion about whether a BOM can be used to recognize UTF-8 is still
> open. There are disadvantages and advantages, how these add up isn't clear
> to me yet.
Oh, of course the BOM (U+FEFF, UTF-8 EF BB BF) _can_ be used to
(unreliably) recognize UTF-8. Just as long as you don't have, say, an ISO
8859-1 text beginning with
LATIN SMALL LETTER I WITH DIARESIS
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
INVERTED QUESTION MARK
Note that these are from the set of 8859-1/UTF-8 misinterpretations Marcus
posted.
And of course, grep would probably handle these as normal text just like
it would ESC % G ...
MfG Kai
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/