[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 in lynx
On Wed, 27 Oct 1999, Edmund GRIMLEY EVANS wrote:
> Klaus Weide <kweide@enteract.com>:
>
> > > It may be a while till there's a generally available one, but I didn't
> > > have to change slang very much to make it understand UTF-8 and store
> > > characters as 24 bits. John Davis says version 2.0 of slang will use
> > > 32 bits for each character ...
> >
> > I guess changing the internal representation (in slang or a curses
> > implementation) isn't that big of a problem; the problems will be
> > in the interfacing between that and the application: how does the
> > app put curses "into UTF-8 mode" or out of it, and how does the library
> > know what the client expects. And how does the client know that the
> > library supports all this in the first place.
>
> Yes, indeed. I changed slang on Saturday morning. Since then I've been
> thinking about the interfacing issues, without solving them ...
(I am not sure what the context of theses changes is. If it's more
or less a personal modification to both mutt and slang, you can of
course do whatever you like; if you are trying to come up with an
extended slang library for public consumption that's drastically
different.)
> > One answer to the question "how to put curses into UTF-8 mode" is to
> > use/set/change the locale info. I am just not convinced this would be
> > sufficient / convenient / portable (much beyond linux dists) enough for
> > widespread adoption, other mechanisms may be needed (in addition),
> > extending the interface of ncurses (etc.) in some way.
>
> Listening to the people on linux-utf8, is seems that the locale is the
> right way to do it. But mutt doesn't really use the locale; it lets
> the user set the character set in a configuration file, or by hand
> during a session. I think lynx works like that, too. So there's a
Yes, as far as character encoding and conversion is concerned.
Lynx tries to provide provide transcoding(or translit.) between N x N
charsets (the "European" ones, at least), there is more than one kind
of current/default character set (display character set,
-assume_local_charset for files) and many possible "charset" parameters
in messages. All this doesn't seem a nice fit with what the locale/
libc stuff provides (or will provide).
(Could well be I just don't understand the latter enough and it will all
fall into place, with iconv and so on.)
> contradiction between the way those programs work and the way the
> library is supposed to work.
>
> At the moment I don't feel confident using mbtowc, etc at all in a
> program that is to be portable, so what I might do is make slang
> decide whether to be in UTF-8 mode from the environment variables
> LC_CTYPE, etc, and make mutt spit out a warning whenever the value of
> its "charset" varible (set by the user) disagrees with what the
> environment variables suggest; both slang and mutt would do the
> character conversion themselves, without the C library support. So the
> user doesn't need an up-to-date C library or a working locale. Does
> that sound reasonable?
I don't know :)
But it seems that, for now, relying on the C library for too much of
this stuff may be cleaner, but very much reduces portability.
E.g. I'd like UTF-8 display and transcoding to work in Lynx even on some
martian paltform with nothing resembling glibc. So we have to roll our
own (string transformations, character counting, workaround for curses
problems, etc).
Klaus
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/