[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xterm utf8controls



Bruno wrote:
: Markus Kuhn writes:
: 
: > If all UTF-8 decoders are safe decoders, this will considerably
: > simplify the handling of UTF-8 in security critical environments.
: 
: Another benefit of safe (unambiguous) UTF-8 encoding is the following:
: 
: When, in a C program, you write
: 
:            wint_t c = getwc(f);
:            ungetwc(c,f);
:            long pos = ftell(f);
:            ...
:            fseek(f,pos,SEEK_SET);
: 
: the file is normally positioned at the initial position before the `getwc'
: call, i.e. calling getwc from that position will again return the same `c'.
: 
: But this is true only if a safe (unambiguous) UTF-8 encoding is used.
: (Because if the first getwc call reads 2 bytes, but ungetwc pushes back
: only 1 byte, then `pos' will point inside a multi-byte character...)
Yes, but isn't unget quite a hack anyway which should perhaps rather 
not be used in serious programs?

: The bottom line is that non-safe UTF-8 decoders have the potential to
: introduce subtle bugs in relation with `fseek'.
I would mark unget deprecated/obsolete.

Thomas
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/