[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: To Do List for 2001



Tue, 16 Jan 2001 15:12:11 +0000 (GMT), Robert Brady <robert@xxxxxxxxxx> pisze:

> Looking further at this, it seems that it assumes wcwidth()==1.
> 
> Does the original author of the patch have any plans or ideas on how this
> could be fixed?
> 
> From a look at the code, I'd say that this patch was done in a very
> low-pain way, with only changes in a few places, whereas a more complete
> patch, would entail lots more damage...

I am the author. Indeed this was a quick and dirty patch, where all
processing is done in terms of UTF-8 byte sequences instead of with
wide characters.

It avoided adding a layer of conversion for each interaction with the
outside world, avoided changing the representation of texts and having
to decide where to store wide characters and where to store bytes,
allowed having an optional UTF-8 mode in parallel with the old 8-bit
behavior and thus didn't break anything for those who don't use it,
avoided loss of information by converting back and forth (in UTF-8
mode you can tab-complete a filename not in UTF-8, which will not
display correctly but will execute).

OTOH it makes the code much less maintainable, harder to adapt
to character properties like wcwidth, harder to introduce more
sophisticated input methods, needs care in every place which decomposes
strings into characters, and does not work for any other non-trivial
multibyte encoding than UTF-8.

IMHO the right way is to store text mostly in wchar_t and do
appropriate conversions for I/O. But e.g. history should be probably
kept in bytes, to avoid corruption when the user starts the shell
with a locale which does not match old history entries.

When I wrote the patch (August 1998), bash used its own readline
functions. AFAIK now it uses the standalone readline library. So the
right way is to make readline UTF-8-aware in general. Unfortunately
the readline interface is extremely ugly.

I don't have time now to work on this. Please consider the patch
orphaned.

-- 
 __("<  Marcin Kowalczyk * qrczak@xxxxxxxxxx http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/