[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 line feeds versus LS/PS
Bram@moolenaar.net (Bram Moolenaar) wrote on 18.09.99 in <199909180958.LAA00348@moolenaar.net>:
> Markus Kuhn wrote:
> > where soft linebreaks inside paragraphs are not saved to the file. The
> > main advantage here is that diffs become significantly compacter
> > (assuming they would operate on byte ranges, not on lines), because
> > changing a few words followed by reformatting a paragraph moves around
> > all these LF bytes that then the revision control system has to take
> > track of, which is not very elegant at the moment.
You mean using something like xdelta instead of diff/patch? Wasn't PRCS
supposed to use xdelta in it's coming version?
OTOH, xdelta would probably cope just fine with changed linebreaks ...
working on byte boundaries has definite benefits. (The other biggie is
binary diffs.)
> One disadvantage is that the width of the wrapped lines depends on the width
> of the terminal. If you view the file on a different terminal it may look
> different. It might be different again when you print it. That might not
> always be what you want.
Why not?
>Wordstar (do you remember that?)
Of course. I still use compatible editors.
>had a soft
> linebreak character for this (CR with the 8th bit set). But only Wordstar
> supported it, thus it wasn't very useful. You always had to print the file
> from Wordstar.
OTOH, "long lines" is a concept that already has pretty wide support. In
fact, you might say it has too much, when people use it with text/plain
where it doesn't belong.
> > However, all this is again *completely* independent and orthogonal to
> > Unicode. Unformatted plain-text files would also be nice with just
> > ASCII, and LF is as good a paragraph separator as Unicode's PS. I'd
> > rather not use LS and PS at all on POSIX systems, because it would break
> > a tremendous amount of software, even though I do appreciate that the
> > clearly-defined LS/PS semantics does have its attractions and is much
> > nicer in UCS-2 files than the historic CR/LF/NL mess.
>
> Just using NL should work fine. As far as I know LF is just another name
> for NL, it's the same character (hex 0x0A). A paragraph could be ended by
> an empty line (in the file that's a double NL). We could even recommend
> this. Perhaps we should add a note about this in appropriate places?
news.announce.newusers? :-)
MfG Kai
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/