[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: what shall we do about iconv?



Edmund GRIMLEY EVANS writes:

> >    So all the application has to protect against is E2BIG,
> >    and it can do so by doing the conversion into a temporary buffer first.
> 
> And how big must the temporary buffer be?
> 
> The worst case I can think of for (output length)/(input length) is
> translating a single space into iso-2022-jp where the output stream is
> in the wrong state and has to be switched first. Then input (20)
> corresponds to output (1b 28 42 20): a ratio of 4.

I don't think you can come up with a definitive worst case bound. For
libiconv, the worst case ratio is around 12: when you convert "‰" to
iso-2022-cn, and it must be transliterated to 4 ASCII characters, and
the state must first be switched. glibc's iconv will contain
transliteration in the future as well.

> I think someone once suggested making the output buffer MB_LEN_MAX
> times the input buffer. But this doesn't seem right. There's no reason
> why the set of encodings supported by iconv should be limited by the
> set of available locales, is there?

Right.

Therefore I can see two approaches: either determine the correct
buffer length in a first run (see libiconv-1.3/extras/iconv_string.c),
or be prepared to realloc the buffer when you see it is not large
enough (similar to what you do in order to retrieve the current
directory using the getcwd() function).

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/