[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 application
Jeu George wrote on 2000-05-31 09:16 UTC:
> I would like to know if the input we give in can be of the type
> char * string= " hello"
> this is how u give input to a normal ascii string
The C data type char * can be used to refer to both single-byte and
multi-byte strings. You can place into a char * string anything ranging
from ASCII over ISO 8859-15 to Shift_JIS and UTF-8.
A program that reads a UTF-8 string from stdin and writes it to stdout
looks roughly like
while ((c == getchar()) != EOF) putchar(c);
Programs of this simplicity can remain fully ignorant of the difference
between UTF-8 and ISO 8859.
> also u could say
> char a = 102 ;
> so a C program will print the charater corresponding to the ascii value
> when u print the variable a
>
> suppose u want to print enter the code for a utf-8 char how do u do it.
On more modern systems (e.g., glibc 2.2),
wputchar(0x20AC);
or
wprintf(L"\u20AC");
will output the Euro sign (U+20AC). What byte sequence this produces
will depend on whether you have selected ASCII, UTF-8, ISO 8859-15 or
CP1252 in your locale. In ASCII you will get say "EUR" as a
transliteration, while in the others you will get the appropropriate 3-
or 1-byte sequence.
Read
http://www.cl.cam.ac.uk/~mgk25/unicode.html
as well as chapter 24 and 25 of
http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-C-FDIS.1999-04.pdf
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/