[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
C-Kermit + Unicode
C-Kermit (if you never heard of it) is a cross-platform,
transport-independent, scriptable communications program written
in C from the Kermit Project at Columbia University:
http://www.columbia.edu/kermit/
C-Kermit 7.0 Beta.10 for UNIX (including Linux), plus VMS, Plan 9,
and AOS/VS was released a few days ago:
http://www.columbia.edu/kermit/ck70.html
The main addition since Beta.09 (and hopefully the last major
addition before the final 7.0 release) is Unicode support.
Kermit protocol and software have included character-set
translation capabilities since the 1980s, allowing conversion of
text among the many "traditional" character sets like the ISO 8859
Latin Alphabets, PC code pages, IBM mainframe EBCDIC code pages,
ISO 646 national character sets, KOI sets, JIS sets, and assorted
proprietary sets (DEC, DG, Apple, NeXT, etc). C-Kermit 7.0 adds
Unicode to the list:
. UCS-2 and UTF-8 are now supported as transfer character sets
(the small number of international standard character sets
allowed "on the wire" in Kermit file transfer; each Kermit
file-transfer partner converts between its local encoding
and the transfer encoding) (UCS-2 and UTF-8 are two
different representations of Unicode / ISO 10646).
(You might ask why UCS-2 is allowed as a transfer character set --
why not stick with UTF-8? It's because CJK can be represented
more compactly in UCS-2.)
. UCS-2 and UTF-8 are now supported as file character sets.
Incoming text can be stored in either UTF-8 or UCS-2, and
UCS-2 or UTF-8 text can be sent with conversion to any
appropriate transfer character set (including conversion of
UCS-2 to UTF-8 or vice-versa). UCS-2 BOMs are handled as
they should be, so "wrong-ended" UCS-2 files are still
interpreted and sent correctly. Incoming files, when stored
as UCS-2, are given the appropriate BOM (unless you specify
otherwise).
. C-Kermit's TRANSLATE command can be used to convert
traditional files to UCS-2 or UTF-8 (and, to the degree
possible, vice versa) on the local computer, as well as
between UCS-2 and UTF-8.
. C-Kermit can conduct UTF-8 terminal sessions, even when its
local character set is not Unicode. (It is also programmed
to do the reverse -- i.e. make connections from a UTF-8
console or Window to a non-Unicode host, but I haven't been
able to test this. But theoretically, you should be able
to use C-Kermit in a UTF-8 xterm window to make a connection
to (say) a Latin-1 host, and have C-Kermit take care of all
the conversion back & forth.)
. C-Kermit's TRANSMIT command can perform "ASCII" (nonprotocol)
uploads of text files, converting them to UTF-8 on the fly.
Or it can upload UTF-8 or UCS-2, converting it to some other
set, etc etc.
(Obviously whenever translating from Unicode to a smaller set,
Unicode characters that are not in the smaller set are lost, just
like when converting from, say, Latin-1 to German ISO 646.)
C-Kermit 7.0 handles Unicode at ISO 10646 "Level 1" (roughly
equivalent to Unicode Normalization Form C), meaning there is no
particular support for combining characters (nor, for that matter,
for nonzero planes). My initial thought was that the cost of a
database lookup and potential recursive canonical (de)composition
per character is a rather high price to pay in a telecommunications
application for a feature (character composition) that is not used
in Plan 9 and probably not in Linux either -- but I could be wrong!
(For example, it might be that some Windows NT applications might
perform canonical decompositions when storing Unicode textual data
-- I don't know -- which would cause problems when transferring
these files to platforms that support Unicode but not composed
characters, unless the transfer agent also converted to Normalization
Form C.)
The Web page lists all the other new features since the previous
release, 6.0, in September 1996. Beta.10 has already been built
successfully on more than 130 different platforms (prebuilt
binaries are available and are listed at the end of the Web page;
if you can built others, please let me know). Until a new edition
of the C-Kermit manual is published, the new features of version
7.0 are documented in the (plain text) ckermit2.txt file; Section
6.6 describes the new Unicode features:
ftp://kermit.columbia.edu/kermit/test/text/ckermit2.txt
Comments and questions, especially on the new Unicode features,
are welcome.
Frank da Cruz
The Kermit Project
Columbia University
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/