[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8 tin



Hello, tin hackers, and utf-8 people. :)

I have put together a small patch against tin which partially adds UTF-8
support. It needs a UTF-8 terminal atm, and depends upon a recent CVS
version libunicode (see http://developer.gnome.org/tools/cvs.html,
to get it) Actually, it only uses iconv from that, so it would be
easy to port to glibc2.1 or other systems with iconv(3). The dependency
upon a UTF-8 terminal should be trivial to fix, though.

Features are :

  * Will correctly display UTF-8 articles.
  * Will correctly display articles in other character sets that 
    the iconv knows about, if they are Content-Transfer-Encoding: 8bit

    Right know, if articles are tagged as being in US-ASCII, or
    ISO-8859-1, (or untagged). it assumes they are in Windows-1252. This
    is due to the vast proliferation of broken Windows news clients. I am
    not sure whether this behaviour is desirable.

Right now, however :

  * there is no support for decoding multibyte characters sets.
 
    I can't see how to do this without rewriting mm_decode. Ideas?

  * base64 encoded articles aren't sent through the charset converter,
  * and i was observing some odd behaviour with quoted-printable articles.

  * finally, there is no support for converting from raw 8bit characters 
    in the header to UTF-8. I am uncertain of how to do this. Have you any
    thoughts?

  * oh, and it doesn't convert character sets for quoted text, etc.

   (anything I forgot?)

It can be obtained from here :

  http://www.ecs.soton.ac.uk/~rwb197/tin-utf.tar.gz

Obviously this diff is in an unsuitable state to go into tin-devel right
now, but if it were finished, and preserved the existing behaviour on
systems without iconv, etc, would something like it be OK do go into tin?

-- 
Robert

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/