[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xchat-20020417 and utf8



On Fri, 19 Apr 2002 11:44:15 +0800
xlonestar <lone@sina.com> wrote:

>  >Sure is, because GTK+ expects nothing but UTF-8 in its widgets.
> 
> That means everything ( nickname, channel name, etc ) were stored in utf8,
> and should be convert back before sending out. sess->server->p* fuctions
> want locale-dependent strings, but any other fuctions now expect utf8 one(
> gunichar*) , converting is needed between them. Am i right?

*shrug* :)

We could convert only parts of the message, for example (receiving):

TOPIC #CHANNEL :new topic

you convert only "new topic" to UTF8, so it can be displayed in the
GtkEntry. But, I thought it would be easier to just convert the whole lot.
Since ASCII is a subset of utf-8, the irc protocol parser will still
understand "TOPIC" in utf-8 form. The problem with converting just the
"text" part, is that you'll have to insert converters in 100 odd places.


>  >Do you mean g_locale_to_utf8? I tried it, but umlauts and other letters
>  >just don't come out right. For example, my locale is reported as
>  >"ANSI_X3.4-1968" by glib, which simply doesn't work on irc. I don't think
>  >ANY sequence is invalid in ISO-8859-1 because it uses the full 0-255
>  >range, and every letter is one byte.
> 
> If your current locale was supported by glibc/iconv, and the input sequences
> are valid to that locale, but them can not get converted, that's really
> strange... g_convert* is just wrapper of iconv i think.

I don't think the input sequence is valid to ANSI_X3.4-1968 (anything above
127 doesn't seem to be). I wouldn't be the only one affected, can you imagine
the 100's or people complaining that they can't see letters like διτ (which are
usually encoded in iso-8859-1 on irc).


>  >My plan was to make an option like:
>  >
>  >Assume IRC Text is:
>  > 1) ISO-8859-1 (default)
>  > 2) Current locale
>  > 3) UTF-8 (would be good if this became a defacto standard)
>  >
> 
> Maybe a "Character Coding" submenu just like Mozilla's one? ;-)

I was hoping to avoid that, but it's heading that way :(


> Why xchat can't be locale/encoding-independent? ISO-8859-* were just among
> hundreds of locale supported by glibc.

It could, if everyone agrees to send utf-8 on irc. It's a bit hard to convince
all irc client authors to do this overnight.


> If there really exists irc servers using UTF-8 as official encoding, we can
> export LC_ALL=en_US.utf8, or any_LOCALE.utf8, then the system will do the
> rest. Even xchat 1.8.x can deal with this.

Well the encoding is really server independant. The users can send anything
they like (or anything their client sends). A server can only encourage people
to use utf-8. What does mIRC use for example?

Then there's the problem of nicknames and channel names. IRC RFC only
allows a limited set of characters there (pretty much only ASCII).

-- 
Peter Zelezny. <zed@xchat.org>
--
XChat-discuss: mailing list for XChat users
Archive:       http://mail.nl.linux.org/xchat-discuss/