[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Unicode 3.0.1 fixes UTF-8 spec security problem




> -----Original Message-----
> From: H. Peter Anvin [mailto:hpa@xxxxxxxxx]
> Sent: Saturday, January 06, 2001 2:05 AM
> To: linux-utf8@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: Unicode 3.0.1 fixes UTF-8 spec security problem
> 
> 
> Followup to:  <C110A2268F8DD111AA1A00805F85E58D0115A8DC@ntgbg1>
> By author:    Karlsson Kent - keka <keka@xxxxx>
> In newsgroup: linux.utf8
> > > 
> > > Yes, it really is.  Anyone knows why they adopted this 
> half-measure
> > > (it fixes 90% of the problem, but it would be nice if 
> they had avoided
> > > this additional wart.)
> > 
> > Yes, but there are just too many "UCS-2 only" 
> implementations deployed.
> > They too may (soon) be faced with UTF-16 data, but will not 
> special treat
> > the "surrogate" range. There is no particular security issue for the
> > non-BMP (non-ASCII really) characters, so leaving the 
> already deployed
> > "UCS-2 only" implementations still Unicode conformant is 
> unproblematical 
> > (from a security point of view), while requireling their 
> update (to make
> > them conformant) would  have been problematical (from a 
> Unicode Consortium
> > point of view).
> 
> Ummm... YES there is such a security issue: there are security issues
> caused by allowing a single string to be encoded in multiple different
> ways.  In fact, a whole slew of security holes in especially
> Microsoft-based web software (servers and clients) have been caused
> just by this -- Microsoft OS's being more vulnerable to this since
> unlike Unix they have lots of redundant spellings.

Please read my message again!  No security issue that has surfaced
do as far I know involve non-ASCII characters, in particular none of them
can (yet) involve any supplementary characters (non-BMP characters),
since none have been allocated yet.  However, when allocated, I don't
see it likely that anyone will use supplementary characters to spell
commands or use them as "magic" characters (like e.g. /) in some way.
In the unlikely event that that happens, also the "irregular" case may
some day be made "illegal" too.  The "fix" done to the Unicode conformace
rules illegalises the "multiple coding" issue with UTF-8 and BMP
characters, while not making non-conformant nearly every currently
deployed implementation of Unicode, except for a few places where
there may be security issues.

		Kind regards
		/kent k
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/