[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xterm utf8controls
Bruno wrote:
: Markus Kuhn writes:
:
: > If all UTF-8 decoders are safe decoders, this will considerably
: > simplify the handling of UTF-8 in security critical environments.
:
: Another benefit of safe (unambiguous) UTF-8 encoding is the following:
:
: When, in a C program, you write
:
: wint_t c = getwc(f);
: ungetwc(c,f);
: long pos = ftell(f);
: ...
: fseek(f,pos,SEEK_SET);
:
: the file is normally positioned at the initial position before the `getwc'
: call, i.e. calling getwc from that position will again return the same `c'.
:
: But this is true only if a safe (unambiguous) UTF-8 encoding is used.
: (Because if the first getwc call reads 2 bytes, but ungetwc pushes back
: only 1 byte, then `pos' will point inside a multi-byte character...)
Yes, but isn't unget quite a hack anyway which should perhaps rather
not be used in serious programs?
: The bottom line is that non-safe UTF-8 decoders have the potential to
: introduce subtle bugs in relation with `fseek'.
I would mark unget deprecated/obsolete.
Thomas
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/