[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: filename and normalization (was gcc identifiers)
Followup to: <Pine.LNX.4.44js.0212040855170.13309-100000@xxxxxxxx>
By author: Jungshik Shin <jshin@xxxxxxxxxxx>
In newsgroup: linux.utf8
> >
> > Yes. Filenames are byte sequences, period, full stop. Any attempt at
> > normalization would violate SUS/POSIX.
>
> All right. That's what the *current* SUS/POSIX says. However, that
> is hardly a solace to a user who'd be puzzled that two visually
> identical and cannonically equivalent filenames are treated differently.
>
> For instance, U+00D6(Latin Capital Letter O with diaresis) should look
> identical and be treated identically with U+004F foll. by U+0308. That's
> what users expect. I don't know what's the best way to resolve
> this conflict. It may be time to consider seriously this particular
> aspect of SUS/POSIX. I'm wondering how MacOS X (well, it's not 100%
> SUS/POSIX compliant, but nonetheless it's Unix) works in this area. It
> uses NFD. That is, 'U+00D6' is stored as 'U+004F U+0308' and both are
> treated idnetically.
>
There *is* no way to solve this problem. You have the same kind of
problem with U+0041 LATIN CAPTIAL LETTER A versus U+0391 GREEK CAPITAL
LETTER ALPHA. However, if you attempt normalizations you *will*
introduce security holes in the system (as have been amply shown by
Windows, even though *its* normalizations are even much simpler.)
The only possible answer is to make sure a decoded representation is
available to the user (ls -b or somesuch.) Attempting
canonicalization is doomed to failure, if nothing else when the next
version of Unicode comes out, and you already have files that are
encoded with a different set of normalizations. Now your files cannot
be accessed! Oops!
-hpa
--
<hpa@xxxxxxxxxxxxx> at work, <hpa@xxxxxxxxx> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <amsp@xxxxxxxxx>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/