[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gcc identifiers
> That's simple, but how would you deal with the fact that
> Unicode has multiple representations of what people would usually
> regard as equivalent? To enable UTF-8 identifiers, that has
> to be taken care of by gcc and linker (if gcc doesn't do a
> compile-time normalization).
I'd say you wouldnt :)
Just accept a null-terminated string of non "/"s for filenames
and accept any ALPHANUMERIC "_" or HIGH_ASCII for identifiers.
No normalization, no processing, not even proper utf-8 validation.
The programmer of course may choose to use proper utf-8 and
some normalization form as a convention, but I see no need to enforce
it it the compiler.
>
> Is there anything mentioned about this in SUS?
Im sorry, what is SUS?
> > Text strings and comments already work fine with utf-8. Just
> > identifiers dont. I think even a "use at your own risk" command
> > line switch, such as "--allow-high-ascii" would be a huge step
> > forward.
>
> Why would you use such a 'legacy-sounding' option name? I'd use
> '--allow-utf8-names'.
It is legacy sounding, because I would rather have it be the default.
Its more appropriate as well: The compiler would'nt have to know
anything about utf-8 in this case, it just knows that there are a set
of bytes which dont cause any problems. This is, I think, a large
part of what utf-8 was designed for, originally.
Normalization, imo, is more for UI/security issues, like DNS lookups,
etc. Besides, if you were to come ascross some source code with
tons of overcoded utf-8, or non-normalized glyphs, that would raise
some eyebrows at least. (no need to have gcc bend over backwards
to normalize the stuff)
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/