[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 tty mode



Yann Dirson wrote:
: Andries.Brouwer@cwi.nl writes:
:  > No doubt there are all kinds of obscure circumstances
:  > that can make this fail. Let me construct an example.
:  > Suppose I use mapscrn and load a screen map.
:  > The bytes \357\200\240 are fed to translate[] and produce
:  > unicode values that are looked up by conv_uni_to_pc().
:  > I can make sure that some of these return -1 or -2.
:  > Then the current position will not be updated and
:  > the routine will incorrectly conclude UTF8.
: 
: Well, cursor has to be in column 2 for "utf8" to be reported.  -1 is
: reported for any column other than 2 and 4.
: 
: Ah ah, you mean when one of those chars map to a control char or a
: zero-width space, do you ?  Yes, this seems to be a problem, although
: I don't think zero-width spaces are much used in 8bit charsets.  A
: more secure way can be, when in column 2, to lookup the char in column
: 1 and check against what we're looking for, although then it may fail
: again because of a particular charset...

Now I partly understand that byte sequence. Apart from the discussion 
whether it's ugly or not (for a transitional period, any heuristic 
approach that helps may be acceptable),
Im wondering about some details: What's the purpose of the ^X and ^Z?
What's the ESC D for? What range of terminals support such a sequence?
(There doesn't seem to be an according termcap entry.)
And: Why should we discuss availability of characters used rather than 
using characters that will be available for sure?
My sequence would be:

ยงD
I.e., a two-byte UTF-8 sequence that consists of two valid Latin-1 
code bytes.

Thomas Wolff
towo@computer.org
http://www.inf.fu-berlin.de/~wolff/mined-utf.html
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/