[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gcc identifiers
On Tue, 3 Dec 2002, seer26 wrote:
> It seems so simple to simply "switch off" whatever error code gcc
> generates when it gets bytes above 7F, and just allow them through.
That's simple, but how would you deal with the fact that
Unicode has multiple representations of what people would usually
regard as equivalent? To enable UTF-8 identifiers, that has
to be taken care of by gcc and linker (if gcc doesn't do a compile-time
normalization).
> This would work fine with the filesystem, assuming its it utf-8 as
> well, #include's DO work fine with utf-8 filenames. (I just tried
> this with gcc under RH8)
The same is true here. Although Unix file system has few
restrictions on file/dir names, it needs to have a provision to specify
how to deal with multiple representations of equivalent characters. Is
there anything mentioned about this in SUS?
> Text strings and comments already work fine with utf-8. Just
> identifiers dont. I think even a "use at your own risk" command
> line switch, such as "--allow-high-ascii" would be a huge step
> forward.
Why would you use such a 'legacy-sounding' option name? I'd use
'--allow-utf8-names'. Anyway, that's not important. What's to be resolved
is issues around Unicode normalization form. Perhaps, C/C++ have to use
NFC (Normalization Form C(omposed)). ISO WG on C standard may already
have made that decision. Does anybody know about any decision
made in this regard? (see http://www.unicode.org/unicode/reports/tr15)
Jungshik
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/