[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ISO 2022



-----Original Message-----
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>

>ISO 2022 was intended as a generic
>tell-me-the-encoding-of-the-following-bytes mechanism.

 It is necessary to note, that the ISO 2022 is essentially focused
on the 7-bit and not 8-bit clean. The 'large' charsets are divided into
96-chars-long parts. Some codepoints from an interval 0x80..0x9f
are forbidden for character encoding, because are used for switching
between parts : si, so, ss1, ss2, lsl1, lsl2, lsl3, lsl4 e.t.c.

 At the same time there are 8-bit charsets, which completely ignore the
ISO 2022 and use the codepoints 0x80..0x9f, for example all MS-DOS
CodePages (charset="IBM866" aka CP866, for example).

 Try to `cat` russian CP866 encoded text in xterm or into another
ISO 2022-enabled system. You will see lot of fun things...


P.S. The charset="CP866" to UNICODE mapping :
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT


--
-=AV=-

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/