[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Use of UTF-8 under Perl and Unix



> > Then why use the very restricted Unicode BOMs, which can only signal the
> > various Unicode encodings, but nothing else. ISO 2022 provides ESC
> > sequences that you can place at the start of a file to signal EVERY
> > encoding in the ECMA registry. Several hundred different ASCII
> > extensions have registered ISO 2022 codes to announce them. If you want
> > to have a stateful encoding with all its uglinees, then better say so by
> > admitting that what you really want is ISO 2022. ISO 2022 is in no way
> > worse than BOMs. It has exactly the same problems.
> 
> I don't know ISO 2022.  The term "ESC sequences" worries me.  Does this mean
> it is not a single unicode character, but a sequnce of unicode characters?
> How many programs would interpret this as being part of the actual text,
> instead of ignoring it?  That would be bad.  You would in fact have created a
> new file type, which causes more trouble than it solves.  Hopefully I'm wrong
> here.

ISO 2022 is the ISO standard for designating character-set information
in a byte stream.  Use of ISO 2022 would allow a file to contain not
just UTF-8 but any other internationally registered character-set.
ISO 2022 is what is used by ANSI X3.64 based terminals such as the DEC
VT line, SCO ANSI, Linux Console, ... to control character-set
display.

When using ISO 2022 a UTF8 byte stream would be prefaced by 

  <ESC> % G

To return to ISO 2022 mode, the byte stream would be followed by

  <ESC> % @


    Jeffrey Altman * Sr.Software Designer * Kermit-95 for Win32 and OS/2
                 The Kermit Project * Columbia University
              612 West 115th St #716 * New York, NY * 10025
  http://www.kermit-project.org/k95.html * kermit-support@kermit-project.org


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/