[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ISO 2022
"Alexander Voropay" <a.voropay@globalone.ru> wrote:
> From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
>
> >ISO 2022 was intended as a generic
> >tell-me-the-encoding-of-the-following-bytes mechanism.
>
> It is necessary to note, that the ISO 2022 is essentially focused
> on the 7-bit and not 8-bit clean.
>
Not at all. ISO 2022 allows both 7-bit and 8-bit octets. If an 8-bit
channel is not available, ISO 2022 allows all the same features and
functions on a 7-bit channel, at some additional expense in transmission
overhead. In the 8-bit environment, the 8th bit acts as a "single shift"
between GL and GR.
> The 'large' charsets are divided into
> 96-chars-long parts. Some codepoints from an interval 0x80..0x9f
> are forbidden for character encoding, because are used for switching
> between parts : si, so, ss1, ss2, lsl1, lsl2, lsl3, lsl4 e.t.c.
>
> At the same time there are 8-bit charsets, which completely ignore the
> ISO 2022 and use the codepoints 0x80..0x9f, for example all MS-DOS
> CodePages (charset="IBM866" aka CP866, for example).
>
Correct. These private character sets have no business leaving the
computers where they are used. The violate every international standard.
It is a crime, in my opinion, that the IETF registers them for MIME.
Only national or international standard character sets (i.e. those in
the ISO International Register) should appear on the wire.
> Try to `cat` russian CP866 encoded text in xterm or into another
> ISO 2022-enabled system. You will see lot of fun things...
>
But if you were using Kermit as your communications program inside your
xterm window (and your xterm font was ISO 8859-5 Latin/Cyrillic), and
you told Kermit to:
set terminal bytesize 8
set terminal character-set cp866 cyrillic-iso
Then you could indeed 'cat' Russian CP866 encoded text on the remote
computer and see the same Russian text on the local one.
I suppose you could also cat it directly if xterm had a Cyrillic font
with cp866 encoding. I don't know, maybe it does.
- Frank
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/