[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to detect the encoding of a string?
Abel Cheung wrote:
> > (because there are very few
> > meaningful strings which look like UTF-8 but aren't).
>
> Yes, that's rare, though real world case has really happened before,
> especially for multibyte characters. Here is a sample:
>
> http://qa.mandrakesoft.com/show_bug.cgi?id=3935
Yes. It's a heuristic, and heuristics are always buggy. The programmer has
to weigh the benefit for the many users for which it "just works" against
the problem that it will cause for a few ones. In this case, when the
heuristic doesn't work, the result will be a filename that is garbage, and
a different garbage than if no heuristic took place.
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/