[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: multibyte encodings other than UTF-8
From: Edmund GRIMLEY EVANS <edmundo@rano.demon.co.uk>
>In fact a list of the multibyte encodings would be nice. I just know
>that there are several for each of Chinese, Japanese, Korean, and the
>only one I ever looked at in detail had number of octets = width on
>display, so you can handle it just like any 8-bit character set.
You can look at the alive multibyte HTMLs. :-)
First two symbols - word 'character' on Japanese. Then word 'character'
on English, and then word 'character' ('si:mvol', symbol) on Russian.
(In the JIS X 0208 character set the Russian and Greek letters
are also defined.)
Shift-JIS (CP-932) :
http://www.sensi.org/~alec/lang/japan/code/ch-sjis.html
JIS (ISO-2022-JP) :
http://www.sensi.org/~alec/lang/japan/code/ch-jis.html
EUC-JP :
http://www.sensi.org/~alec/lang/japan/code/ch-euc.html
UTF-8 :
http://www.sensi.org/~alec/lang/japan/code/ch-utf8.html
NCR:
http://www.sensi.org/~alec/lang/japan/code/ch-ncr.html
The NCR (Numeric Character Reference -- decimal representation
of UNICODE codepoints) is too kind of the multibyte encoding for
HTML 4.0 :-) .
--
-=AV=-
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/