[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: current idea
On Sun, 4 Nov 2001, George W Gerrity wrote:
> >A 22bit integer is used for that purpose.
>
> That comment in another letter re-enforced my belief that the lisp
> engine was the trouble. I ASSUMED that the lisp atom was a 32-bit
> word, and that the missing ten bits were taken up with tags, etc. The
> point is, however, that maybe 22 bits is OK for this round, but what
> do you do in a year or two when the higher planes get more populated,
> and someone wants to use emacs for some quick and dirty editing of a
> scholarly work on cuneiform, say?
I suspect, the above comment is based on unfamiliarity with the
Unicode/UCS codespace architecture. UCS/Unicode have agreed not to define
any characters above U-0EFFFF. That plus the 2^17 private use characters
U-0F0000 .. U-10FFFF is the space that UTF-16 can handle and that fits
into 20.09 bits, so a 22-bit architecture will not overflow with future
Unicode extensions, as all future planes that Unicode could ever add are
already covered by the Emacs encoding, including Cuneiform, Tengwar and
Klingon. A 22-bit architecture has even almost two million non-Unicode
codepoints free for proprietary non-standard GNU extensions. I think we
are safe at least until the SETI folk make a breakthrough and discover
extraterrestrial civilizations with even more complicated scripts than
Han.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/