[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: c++ strings and UTF-8 (other charsets)
Keld Jørn Simonsen wrote:
>
> On Tue, Feb 27, 2007 at 07:49:17PM -0500, Daniel B. wrote:
> > Marcel Ruff wrote:
> > >
> > ...
> > > As UTF-8 may not contain '\0' ...
> >
> > Yes it can.
>
> yes, it can, but then it represent the character NULL.
> And strings in C/C++ are not supposed to contain the NULL character.
True, C strings can't contain a null byte other than the terminating
byte, so, since they can't contain a(ny other) null byte, they can't
represent the character NUL/NULL (in ASCII or standard UTF-8 encoding).
However, make sure you don't neglect to handle the fact that that a
UTF-8 input stream (just link an ASCII input stream), can contain a
null byte (representing a NULL character).
(I don't know if this mailing list deals only with files in general
(which could contain null-byte representations of NULL characters)
or deals with restricted strings (e.g., strings used to name files,
which strings are defined to never contain a NULL character).)
Daniel
--
Daniel Barclay
dsb@xxxxxxxxx
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/