[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: perl unicode support
Rich Felker wrote:
>
> On Sat, Mar 31, 2007 at 07:44:39PM -0400, Daniel B. wrote:
> > Rich Felker wrote:
> > > Again, software which does not handle corner cases correctly is crap.
> >
> > Why are you confusing "special-case" with "corner case"?
> >
> > I never said that software shouldn't handle corner cases such as illegal
> > UTF-8 sequences.
> >
> > I meant that an editor that handles illegal UTF-8 sequences other than
> > by simply rejecting the edit request is a bit if a special case compared
> > to general-purpose software, say a XML processor, for which some
> > specification requires (or recommends?) that the processor ignore or
> > reject any illegal sequences. The software isn't failing to handle the
> > corner case; it is handling it--by explicitly rejecting it.
>
> It is a corner case!
We seem to be having a communication problem, but I don't quite see
what the cause is.
I agree that it is a corner case. However, (seemingly) clearly, what
you wrote indicates you think I don't or wouldn't.
(I was arguing that handling the corner case by doing something other
than simply rejecting the illegal UTF-8 sequences was a bit of a
special case, just like, say, handling ill-formed XML is not something
a general XML processor (parser) has to do (it rejects it) but _is_
something a typical XML editor would want to do.
And to be clear, I'm not arguing that an editor should _not_ be a
special case (that is, not arguing that it shouldn't be careful to avoid
changing the file unintentially). I was only pointing out that it _is_
a special case (because whatever UTF-8 issues we were talking about
many message ago seem top apply differently to special-case tools (e.g.,
a general text editor) vs. general tools (e.g., HTTP POST receiver code).
Maybe at first I thought you were talking about a UTF-8-_only_ editor.)
> Itâ??s simply not acceptable for opening a file and resaving it to not
> yield exactly the same, byte-for-byte identical file, because it can
> lead either to horrible data corruption or inability to edit when your
> file has somehow gotten malformed data into it.
(Yes, I agree.)
...
> > You said you're talking about a text editor, that reads bytes, displays
> > legal UTF-8 sequences as the characters they represent in UTF-8, doesn't
> > reject other UTF-8-illegal bytes, and does something with those bytes.
> >
> > What does it do with such a byte? It seems you were taking about
> > mapping it to some character to display it. Are you talking about
> > something else, such as displaying the hex value of the byte?
>
> Yes.
Roger.
Daniel
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/