[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gcc identifiers
Followup to: <Pine.LNX.4.44js.0212032338180.13309-100000@xxxxxxxx>
By author: Jungshik Shin <jshin@xxxxxxxxxxx>
In newsgroup: linux.utf8
>
> That's simple, but how would you deal with the fact that
> Unicode has multiple representations of what people would usually
> regard as equivalent? To enable UTF-8 identifiers, that has
> to be taken care of by gcc and linker (if gcc doesn't do a compile-time
> normalization).
>
I don't really think normalization is a major issue here. Maybe it
should be, but I suspect it isn't a problem in practice. I suspect
attempting normalization would cause more problems that it's worth.
Maybe a --normalize-utf option to the linker might be a good idea, but
it should be an option, IMO.
> > This would work fine with the filesystem, assuming its it utf-8 as
> > well, #include's DO work fine with utf-8 filenames. (I just tried
> > this with gcc under RH8)
>
> The same is true here. Although Unix file system has few
> restrictions on file/dir names, it needs to have a provision to specify
> how to deal with multiple representations of equivalent characters. Is
> there anything mentioned about this in SUS?
Yes. Filenames are byte sequences, period, full stop. Any attempt at
normalization would violate SUS/POSIX.
-hpa
--
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <amsp@xxxxxxxxx>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/