[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 application



Jeu George wrote on 2000-05-31 09:16 UTC:
> I would like to know if the input we give in can be of the type 
>  char * string= " hello"
>  this is how u give input to a normal ascii string

The C data type char * can be used to refer to both single-byte and
multi-byte strings. You can place into a char * string anything ranging
from ASCII over ISO 8859-15 to Shift_JIS and UTF-8.

A program that reads a UTF-8 string from stdin and writes it to stdout
looks roughly like

while ((c == getchar()) != EOF) putchar(c);

Programs of this simplicity can remain fully ignorant of the difference
between UTF-8 and ISO 8859.

> also u  could say 
>  char a = 102 ;
>  so a  C program will print the charater corresponding to the ascii value
> when u print the variable a
> 
> suppose u want to print enter the code for a utf-8 char how do u do it.

On more modern systems (e.g., glibc 2.2),

  wputchar(0x20AC);

or

  wprintf(L"\u20AC");

will output the Euro sign (U+20AC). What byte sequence this produces
will depend on whether you have selected ASCII, UTF-8, ISO 8859-15 or
CP1252 in your locale. In ASCII you will get say "EUR" as a
transliteration, while in the others you will get the appropropriate 3-
or 1-byte sequence.

Read

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

as well as chapter 24 and 25 of

  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-C-FDIS.1999-04.pdf

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/