[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode control characters



All this talk of zero-width spaces, paragraph separators and the like
has got me worried. I suppose I had always assumed that all Unicode
characters, apart from 0..0x1f and 0x7f..0x9f, would be printable,
because things like paragraphs would be indicated with XML mark-up or
some other means outside the character set. How wrong I was.

So, what are the main things I have to worry about for my MUA (mutt)
and curses-compatible library (slang)? Which Unicode control
characters ought I to interpret, and how can I recognise the others
(so that I can ignore them or convert them to hex or whatever)?

By the way, I noticed that the vertical line down the right-hand side
of Markus's UTF-8-test.txt seems to assume that U-00000080 has wcwidth
of 1. My xterm displays it as an empty box, one cell wide, but my MUA
converts it to hex, and Emacs in UTF-8 mode won't allow it at all.

Edmund
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/