[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: help on utf-16
On Fri, 5 Jul 2002 17:22:57 -0700
"Arun v" <arunv@xxxxxxxxxxxxx> wrote:
> Hi
>
> Im a newbie to this unicode world.
> Im developing a EcmaScript Interpreter according to ECma 262 Standard,which
> states the input given to the ecma interpreter
> will be in UTF-16(normalised to Unicode Normalised form C) transformation
> format.
>
> I have an C program in Linux(which acts as scanner for the interpreter),now
> I wanna make it aware of UTF-16 transformed
> input.(I need not do any transformation or normalisation but make my
> program understand the UTF-16 Encoding)
Do the identifiers and variable names of ECMAScript itself need to be in
Unicode or just the Strings? If it's just the strings then then just decode
each in your tokenizer (or I suppose you could build up each string in your
state machine). Then you will need basic string operators for UTF-16.
> P.N : also suggest me some good online resource of Unicode and UTF-16
See this FAQ. It's focus is UTF-8 on Unix but UTF-16 uses the same
principle and will be a good jumping off point.
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Mike
--
http://www.eskimo.com/~miallen/c/jus.c
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/