On Wed, 2002-03-27 at 20:42, Peter Zelezny wrote: > On Wed, 27 Mar 2002 23:28:21 -0500 > "butterbrain" <bbrain@phateds.nu> wrote: > > > IRC does not define a character set. the command names all happen to be > > strings of printable ascii, which are of course also UTF-8 strings, since > > UTF-8 has printable ascii as a subset. there are restrictions on labels such > > as nicknames and channel names, but all of the other text such as PRIVMSG's > > (excepting CTCP), as well as MOTD's and TOPIC's have no restriction other > > than NUL and CRLF. better to just pass on the native UTF-8 strings than to > > strip them of non-ascii characters. this way you still are compatible with > > ascii users, as well as anyone else using UTF-8. of course, not many people > > use UTF-8 natively, so providing conversion tables for character sets other > > than ascii may be useful still. > > Well UTF8 is not being used on IRC. I wasn't intending on stripping the > non-ascii characters, but converting them to latin-1. This is the only > way to make it work correctly with current clients. An option to send/recv > in utf8 might be an idea, but I'd leave that off by default. Though ISO 8859-1 is the most common encoding on IRC, it is not the only one. I suggest a facility for matching encodings (UTF-8, ISO 8859-1, etc) to server, server/channel, or server/nick!user@host combinations. (Regular expressions are a must for servers, channels, and nick!user@host masks, so please use them!) For instance, one could specify that irc.example.com requests that all clients use UTF-8, so one would set the server 'irc\.example\.com' to always use UTF-8. Then one could specify that '.*\.undernet\.org' uses ISO 8859-1, except in #example-UTF-using-channel, so one could specify that '#example-UTF-using-channel' on server '.*\.undernet\.org' should use UTF-8. Also, the user 'UTFuser!utf@example.com' uses UTF-8 in private messages and notices, so 'UTFuser!utf@example\.com' on server '.*\.undernet\.org' would be specified to use UTF-8. It should be possible for scripts and plugins to add or remove these associations programmatically, and for them to add temporary associations which have an effect but are not saved to disk; this way, some sort of protocol (something in the topic of a channel, or a message sent by the server upon connecting, for instance) could be implemented for auto-detecting the proper encoding for a given context. I hope this will prove feasible, because it would solve the encoding problem -- in my opinion -- in the best possible way. Regards, Alex. -- PGP Public Key: http://aoi.dyndns.org/~alex/pgp-public-key -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS d- s:++ a18 C++(++++)>$ UL+++(++++) P--- L+++>++++ E---- W+(+++) N- o-- K+ w--- !O M(+) V-- PS+++ PE-- Y+ PGP+(+++) t* 5-- X-- R tv b- DI D+++ G e h! !r y ------END GEEK CODE BLOCK------
This is a digitally signed message part