[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: charset handling by mail program



Edmund GRIMLEY EVANS writes:

> There are two configuration variables, local_charsets and
> send_charsets, both equal to a list of charsets.
> ...
> When a file is attached, the original charset is set equal to the
> first local_charset that makes sense, i.e. can be converted by iconv
> without an EILSEQ. Except that a file containing characters in the
> range \x80-\x9f will not be accepted as iso-8859-X, and perhaps we
> need some other exceptions here?
> 
> The target charset is then set to the first send_charset into which
> the file can be converted in a reversible fashion. Failing that, the
> original charset is used as the target charset.
> ...
> set local_charsets="utf-8:iso-8859-1"
> set send_charsets="us-ascii:iso-8859-1:iso-8859-3:utf-8"

Pretty convincing.

Maybe the local_charsets could, if not set, default to the locale's
character set (nl_langinfo(CODESET) or equivalent).

I don't think the local_charsets list should be an ordered list, because
many people would get the order wrong, and also because for a given file,
it's not a question of probability. Thus, I suggest to try all encodings
in local_charsets without priorities, and in case of ambiguity ask the
user for confirmation, after presenting him, for every possible original
charset, the attempted conversion from that charset to UTF-8. (The non-
ASCII lines only, since the ASCII lines will not help the user choosing.)

> Can anyone see a fundamental problem with this approach or suggest any
> improvements?

There is one more issue with bidirectional text for semitic languages:
RFC 1556 defines additional MIME types, used for representing the
kind of bidi ordering (visual, implicit, explicit). In UTF-8, we have to obey
Unicode Technical Report #9. Does anyone know how a mailer should behave
when it does its ISO-8859-8 <--> UTF-8 conversion?

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/