[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Full BMP test file available
The new file
http://www.cl.cam.ac.uk/~mgk25/ucs/full-bmp.txt
contains the UTF-8 sequences of all code positions in the
ISO 10646-1 Basic Multilingual Plane, except for the C0 and C1 control
character areas. This corresponds to all codes in the range U+0020 -
U+007E and U+00A0 - U+FFFF.
NOTE: The ranges U+D800 - U+DFFF (surrogates) and U+FFFE - U+FFFF are
not supposed to appear in valid UTF-8 files normally and UTF-8
decoders are allowed to treat them like malformed sequences.
It has the same form as the *.repertoire-utf8 files that come with the
ucs-fonts package. Feel free to use it in your test suites.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/