[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: c++ strings and UTF-8 (other charsets)
On Tue, Feb 27, 2007 at 07:49:17PM -0500, Daniel B. wrote:
> Marcel Ruff wrote:
> >
> ....
> > As UTF-8 may not contain '\0' ...
>
> Yes it can.
No, I think he just meant to say "a string of non-NUL _characters_ may
not contain a 0 _byte_". The NUL character is not valid "text" or a
valid part of a "string" in the POSIX sense of "text" or the C/POSIX
sense of "string".
> Are you thinking of Java's _modified_ version of UTF-8
> (http://en.wikipedia.org/wiki/UTF-8#Java)?
Uhg, disgusting...
BTW, note that ill-advised programs allowing NUL characters in text
where they do not belong often leads to vulnerabilities, like the
Firefox vuln just a few days ago.
Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/