[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gcc identifiers



Followup to:  <Pine.LNX.4.44js.0212032338180.13309-100000@xxxxxxxx>
By author:    Jungshik Shin <jshin@xxxxxxxxxxx>
In newsgroup: linux.utf8
> 
>  That's simple, but how would you deal with the fact that
> Unicode has multiple representations of what people would usually
> regard as equivalent?  To enable UTF-8 identifiers, that has
> to be taken care of by gcc and linker (if gcc doesn't do a compile-time
> normalization).
> 

I don't really think normalization is a major issue here.  Maybe it
should be, but I suspect it isn't a problem in practice.  I suspect
attempting normalization would cause more problems that it's worth.

Maybe a --normalize-utf option to the linker might be a good idea, but
it should be an option, IMO.

> > This would work fine with the filesystem, assuming its it utf-8 as
> > well, #include's DO work fine with utf-8 filenames. (I just tried
> > this with gcc under RH8)
> 
>  The same is true here. Although Unix file system has few
> restrictions on file/dir names, it needs to have a provision to specify
> how to deal with multiple representations of equivalent characters. Is
> there anything mentioned about this in SUS?

Yes.  Filenames are byte sequences, period, full stop.  Any attempt at
normalization would violate SUS/POSIX.

	-hpa
-- 
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@xxxxxxxxx>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/