[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xterm utf8controls



Markus Kuhn writes:

> If all UTF-8 decoders are safe decoders, this will considerably
> simplify the handling of UTF-8 in security critical environments.

Another benefit of safe (unambiguous) UTF-8 encoding is the following:

When, in a C program, you write

           wint_t c = getwc(f);
           ungetwc(c,f);
           long pos = ftell(f);
           ...
           fseek(f,pos,SEEK_SET);

the file is normally positioned at the initial position before the `getwc'
call, i.e. calling getwc from that position will again return the same `c'.

But this is true only if a safe (unambiguous) UTF-8 encoding is used.
(Because if the first getwc call reads 2 bytes, but ungetwc pushes back
only 1 byte, then `pos' will point inside a multi-byte character...)

The bottom line is that non-safe UTF-8 decoders have the potential to
introduce subtle bugs in relation with `fseek'.

                     Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/