[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: w3m with UTF-8 support available
(I was down with the flu for a few days, sorry.)
Edmund GRIMLEY EVANS <edmundo@xxxxxxxx> wrote:
> I hope to look at the source myself, but can you tell us more about
> UTF-8 as display encoding? How does it compare with Lynx?
It works.
w3m was developed from the beginning to deal with the peculiarities
of East Asian character sets. It doesn't use curses or any other
third party screen handling package.
It clearly *is* work in progress. The i18n patch, which is now
available in a third version based off w3m-0.1.6, has yet to be
integrated into w3m proper, and I don't know what the plans about
this are.
w3m-i18n works nicely for viewing pages encoded in or using (a mix
of) characters from ISO 8859-{1,2}, KOI-8R, and related character
sets on a UTF-8 display. As a minor nuissance, the document encoding
has to be still chosen manually at this time. Double width and
combining characters probably aren't handled for UTF-8 yet, since
xterm hasn't stabilized in this regard yet.
One somewhat serious limitation is that w3m-i18n currently uses an
ISO 2022 scheme internally and can't handle the full Unicode
repertoire, e.g. if you use it as a pager (it's second function,
besides being a text mode web browser) to look at Markus'
UTF-8-demo.txt, the result is badly mangled.
w3m was (and still is) written primarily by Japanese folks. In
particular the i18n part could probably profit substantially from
more extensive European participation.
--
Christian "naddy" Weisgerber naddy@xxxxxxxxxxxxxxxxxxxx
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/