[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: what shall we do about iconv?



Bruno Haible <haible@xxxxxxx>:

> b. Often, if EILSEQ or EINVAL occurs, the entire conversion is aborted, and
>    it does not matter how many non-reversible character conversions were
>    already made. So all the application has to protect against is E2BIG,
>    and it can do so by doing the conversion into a temporary buffer first.

And how big must the temporary buffer be?

The worst case I can think of for (output length)/(input length) is
translating a single space into iso-2022-jp where the output stream is
in the wrong state and has to be switched first. Then input (20)
corresponds to output (1b 28 42 20): a ratio of 4.

The worst case I can think of for a reasonably long input is
translating characters such € from cp1252 into utf-8: a ratio of 3.

If we include encodings such as an array of wchar_t, then it's easy to
get a ratio of 4 for any length of input, and if we allow arrays of
64-bit integers, then the ratio might be 8, but neither of these cases
is relevant for e-mail.

I think someone once suggested making the output buffer MB_LEN_MAX
times the input buffer. But this doesn't seem right. There's no reason
why the set of encodings supported by iconv should be limited by the
set of available locales, is there? MB_LEN_MAX might quite reasonably
be 1, 3 or 6, as far as I can tell, but it looks as though a sensible
ratio of (output buffer)/(input buffer) might be 4.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/