[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: High-Speed UTF-8 to UTF-16 Conversion
On Sat, Mar 17, 2007 at 06:25:59PM +0600, Christopher Fynn wrote:
> Colin Paul Adams wrote:
>
> >>>>>>"Rich" == Rich Felker <dalias@xxxxxxxxxx> writes:
> >
> > Rich> Indeed, this was what I was thinking of. Thanks for
> > Rich> clarifying. BTW, any idea WHY they brought the UTF-16
> > Rich> nonsense to DOM/DHTML/etc.?
>
> >I don't know for certain, but I can speculate well, I think.
>
> >DOM was a micros**t invention (and how it shows!). NT was UCS-2
> >(effectively).
>
> AFAIK Unicode was originally only planned to be a 16-bit encoding.
> the The Unicode Consortium and ISO 10646 then agreed to synchronize the
> two standards - though originally Unicode was only going to be a 16-bit
> subset of the UCS. A little after that Unicode decided to support UCS
> characters beyond plane 0.
>
> Anyway at the time NT was being designed (late eighties) Unicode was
> supposed to be limited to < 65536 characers and UTF-8 hadn't been
> thought of, so 16-bits probably seemed like a good idea.
While this is probably true, it's also aside from the point. I wasn't
asking why Windows used UCS-2, but why JavaScript remained stuck on
the 16bit idea even after the character set expanded -- since JS is a
pretty high level lang and the size of types is largely irrelevant,
redefining characters to be 32bit integers shouldn't have broken
anything.
Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/