[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

charset handling by mail program



This isn't really about UTF-8, but I'm guessing people here are
experienced with working in a multi-charset environment and can
perhaps help me.

I'm thinking of making mutt (an MUA - www.mutt.org) handle the charset
of an attachment as follows.

There are two configuration variables, local_charsets and
send_charsets, both equal to a list of charsets.

When a file is attached, and before you actually send the message, the
MIME parts are listed. Each text part has two charsets associated with
it in the list: the original charset and the target charset.

When a file is attached, the original charset is set equal to the
first local_charset that makes sense, i.e. can be converted by iconv
without an EILSEQ. Except that a file containing characters in the
range \x80-\x9f will not be accepted as iso-8859-X, and perhaps we
need some other exceptions here?

The target charset is then set to the first send_charset into which
the file can be converted in a reversible fashion. Failing that, the
original charset is used as the target charset.

Now there are two commands available to the user before he or she
sends the message:

(edit_charset) Sets the original charset (it will be an error if the
file is misformed with respect to that charset) and causes the target
charset to be recomputed.

(change_charset) Sets the target charset. A warning will be issued if
the conversion is not reversible.

I might configure mutt thus:

set local_charsets="utf-8:iso-8859-1"
set send_charsets="us-ascii:iso-8859-1:iso-8859-3:utf-8"

(Most people would have us-ascii at the start of send_charsets.)

Can anyone see a fundamental problem with this approach or suggest any
improvements?

(The same send_charsets would be used in deciding how to encode the
headers (RFC 2047, etc), but I don't intend allowing any user
intervention for that.)

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/