[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
filename and normalization (was gcc identifiers)
On 3 Dec 2002, H. Peter Anvin wrote:
> By author: Jungshik Shin <jshin@xxxxxxxxxxx>
> > The same is true here. Although Unix file system has few
> > restrictions on file/dir names, it needs to have a provision to specify
> > how to deal with multiple representations of equivalent characters. Is
> > there anything mentioned about this in SUS?
>
> Yes. Filenames are byte sequences, period, full stop. Any attempt at
> normalization would violate SUS/POSIX.
All right. That's what the *current* SUS/POSIX says. However, that
is hardly a solace to a user who'd be puzzled that two visually
identical and cannonically equivalent filenames are treated differently.
For instance, U+00D6(Latin Capital Letter O with diaresis) should look
identical and be treated identically with U+004F foll. by U+0308. That's
what users expect. I don't know what's the best way to resolve
this conflict. It may be time to consider seriously this particular
aspect of SUS/POSIX. I'm wondering how MacOS X (well, it's not 100%
SUS/POSIX compliant, but nonetheless it's Unix) works in this area. It
uses NFD. That is, 'U+00D6' is stored as 'U+004F U+0308' and both are
treated idnetically.
Jungshik
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/