[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: utf-8 encoding scheme



  "H. Peter Anvin" <hpa@xxxxxxxxx> writes:

> The alternate spelling
> 
> 	11000001 10001011
> 
> ... is not the character K <U+004B> but INVALID SEQUENCE.  One
> possible thing to do in a decoder is to emit U+FFFD SUBSTITUTION
> CHARACTER on encountering illegal sequences.

Is there any consensus whether to use one or two U+FFFD characters in
such situations? For example, what do Perl, Tcl and Java here?
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/