[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xterm utf8controls
Markus Kuhn writes:
> If all UTF-8 decoders are safe decoders, this will considerably
> simplify the handling of UTF-8 in security critical environments.
Another benefit of safe (unambiguous) UTF-8 encoding is the following:
When, in a C program, you write
wint_t c = getwc(f);
ungetwc(c,f);
long pos = ftell(f);
...
fseek(f,pos,SEEK_SET);
the file is normally positioned at the initial position before the `getwc'
call, i.e. calling getwc from that position will again return the same `c'.
But this is true only if a safe (unambiguous) UTF-8 encoding is used.
(Because if the first getwc call reads 2 bytes, but ungetwc pushes back
only 1 byte, then `pos' will point inside a multi-byte character...)
The bottom line is that non-safe UTF-8 decoders have the potential to
introduce subtle bugs in relation with `fseek'.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/