[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multibyte encodings other than UTF-8



From: Edmund GRIMLEY EVANS <edmundo@rano.demon.co.uk>


>In fact a list of the multibyte encodings would be nice. I just know
>that there are several for each of Chinese, Japanese, Korean, and the
>only one I ever looked at in detail had number of octets = width on
>display, so you can handle it just like any 8-bit character set.


 You can look at the alive multibyte HTMLs. :-)

 First two symbols - word 'character' on Japanese. Then word 'character'
on English, and then word 'character' ('si:mvol', symbol) on Russian.
(In the JIS X 0208 character set the Russian and Greek letters
are also defined.)

Shift-JIS (CP-932) :
http://www.sensi.org/~alec/lang/japan/code/ch-sjis.html
JIS (ISO-2022-JP) :
http://www.sensi.org/~alec/lang/japan/code/ch-jis.html
EUC-JP :
http://www.sensi.org/~alec/lang/japan/code/ch-euc.html
UTF-8 :
http://www.sensi.org/~alec/lang/japan/code/ch-utf8.html
NCR:
http://www.sensi.org/~alec/lang/japan/code/ch-ncr.html

 The NCR (Numeric Character Reference -- decimal representation
of UNICODE codepoints) is too kind of the multibyte encoding for
HTML 4.0  :-) .

--
-=AV=-

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/