[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: less-344 with UTF-8



Mark Nudelman wrote on 1999-10-27 20:02 UTC:
> I've prepared a release of less which includes Robert Brady's changes to
> support UTF-8.  Before I make it widely available, I thought some of you
> on this list with expertise in UTF-8 might like to try it out and let me
> know if you see any problems.  You can get it from
> 	http://www.flash.net/~marknu/less/less-344.tar.gz

I just installed it, and it seems to be working very nicely.

The only strange effect that I noticed occurred when viewing the
examples/UTF-8-test.txt file in

  http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.html ,

which contains lots of interesting illegal UTF-8 sequences. In this
file, I noticed occasionally that less and xterm disagree on how many
cells an illegal UTF-8 character occupies, which shows up especially if
you make the xterm window 78 cells wide and scroll upwards by pressing
"k". Look especially what happens when test section 3.1.9 scrolls down
from the top of the screen and compare its look with how it looks like
when you scroll to it from the beginning of the file by pressing space.

But compatible behaviour with xterm in the presence of illegal UTF-8
sequences (which we are normally not supposed to have in files anyway)
is probably a more advanced geek feature anyway, and fixing it should
probably not delay the release of the first UTF-8 aware less version.

The ultimate solution is probably that less should not depend on how the
terminal interprets illegal UTF-8 sequences, but that it should display
every byte that is not part of a well-formed UTF-8 sequence as an
inverse hex pair.

Good work!

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/