[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding conversions
Carl W. Brown <cbrown@xxxxxxxxxxx>:
> If they validate UTF-8 (xiua_ValidateStr) it will check each character to be
> a valid UTF-8 initial character followed by the proper number of
> continuation characters if any. It will make sure that it is not a
> surrogate character nor a reversed BOM nor exceed the Unicode 3.1 character
> range.
Note also that "\xe0\x84\x80" is illegal, for example, as U+0100
should be represented only by "\xc4\x80".
Perhaps you want to exclude U+FFFF, too.
Edmund
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/