Kaixo! On Tue, May 08, 2001 at 04:18:19PM +0200, Bruno Haible wrote: > Many GNU text processing utilities doen't work correctly in multibyte > locales. BTW, the following is not utf-8 nor encoding related; but related to i18n. It may even not be considered as a bug; however, its result will be considered as undesirable by a wide portion of users (it is the case for me at least). So, in the old days, when only ascii existed, there was (and still is), two string sirting functions: strcmp and strcasecmp, one is case sensitive and the other is case insensitive. That made people accostummed to use [A-Z] and [a-z] in regexp expressions as two very different things. However, now in an i18n environment there is *only one* such function: strcoll. And it is case insensitive; there is no case sensitive equivalent (in standard at least). The result is that when you set your locale to anything other than 'C', then both [A-Z] and [a-z] become the same thing as [A-Za-z]. That is a very annoying situtaiton. The solution is quite simple: implement a case sensitive version of strcoll (see the attached file, a small patch I did for bash). If there are some people that thinks the odd behaviour has to be provided too; then at least implement the possibility to let the user choose (through a command line option, or an environment variable) to use a case sensitive or case insensitive behaviour. But Unix being case sensitive in its file system; and old behaviour (C only) beign case sensitive; it would be logical to continue to keep case sensitivness; breaking it is a very bad thing, and considered by a lot of people as a bug. Programs currently hurst by that problem are 'bash' and 'grep'; but probably others. Thanks -- Ki ça vos våye bén, Pablo Saratxaga http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975
Attachment:
my-strcoll.diff.bz2
Description: BZip2 compressed data