[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: libunicode
Seer <seer26@xxxxxxxxxxxx> writes:
> looking at glib 2.0.4 from gnome ftp, the criticisms are still
> valid: it doesnt check for overcoded sequences (the UTF8_GET macro
> (notice that the UNICODE_VALID macro seems essentially useless))
Iterating through Unicode strings is _very_ time critical for
code that uses UTF-8 as its internal representation;
so we check validity at the boundaries - g_utf8_validate().
(Also see g_utf8_get_char_validated() for the few times you
need to do it a character at a time.... e.g. when GICHannel
is reading streaming UTF-8 )
> g_utf8_get_char seems to have lost something in the conversion,
> it no longer gives back an updated pointer, which would be useful
> for iterating over a utf-8 string.
It was decided that
while (p) {
gunichar ch = g_utf8_get_char (p)
[ do something with ch ]
p = g_utf8_next_char (p);
}
Was the most convenient API; g_utf8_next_char() is a macro
that does a table lookup so there isn't much efficiency lost.
> also, its range of error return codes dont give the api user
> enough information to layout a good error message.
It's not clear to me that telling the user _how_ their utf-8
string is invalid is particularly useful. Remember, it is
a stretch for most users to even understand what UTF-8 is.
(an invalid UTF-8 string is almost certainly a local-encoded
string in practice)
> And, it has no support for string visible width computation, which
> could easily be added.
Well, GLib is used mostly in GUI apps where wcwidth() isn't
useful. But it certainly could be added. (bugzilla.gnome.org
is the place for RFE's.)
> lastly, gunichar is unsigned. I was thinking it could be signed
> for several reasons, such as using negatives for error codes...
The way we do errors in GLib is GError (see the
reference docs) which allows for very nice explicit string
error messages.
g_utf8_get_char() g_utf8_get_char_validated() return
(gunichar)-1, (gunichar)-2, for error codes; I guess you
lose the '< 0' simple check, but it just isn't that common
to need to check the returns here.
> these functions are in such a fund amental part of the gnome
> codebase, that it might be a good idea to iron them out,
> but i dont imagine any changes being accepted before the big
> release.
Before? The big release (of GTK+-2.0 and GLib-2.0) was
3 months ago... :-)
Regards,
Owen
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/