[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 line feeds versus LS/PS
Markus Kuhn wrote:
> Side remark:
>
> It would indeed be nice to also introduce under Unix a text format,
> where paragraphs are formatted at display time (like Word does), and
> where soft linebreaks inside paragraphs are not saved to the file. The
> main advantage here is that diffs become significantly compacter
> (assuming they would operate on byte ranges, not on lines), because
> changing a few words followed by reformatting a paragraph moves around
> all these LF bytes that then the revision control system has to take
> track of, which is not very elegant at the moment.
>
> It would indeed be very helpful, if emacs, vim, less, etc. had a mode
> similar to the Windows notepad and Word, where paragraphs are
> essentially long lines without any LF in them. LF-free paragraphs would
> especially be convenient for editing plaintext-files that will later be
> reformatted anyway and where line length doesn't matter at all, e.g.
> HTML and TeX.
This is true. The reason Vim doesn't support automatic paragraph formatting
is that there is no "soft" line separator. I'm glad there is something we can
agree on!
You can work with single-line paragraphs in Vim by setting the 'linebreak'
option. This might be the mode you are looking for. See ":help 'linebreak'"
for more information.
One disadvantage is that the width of the wrapped lines depends on the width
of the terminal. If you view the file on a different terminal it may look
different. It might be different again when you print it. That might not
always be what you want. Wordstar (do you remember that?) had a soft
linebreak character for this (CR with the 8th bit set). But only Wordstar
supported it, thus it wasn't very useful. You always had to print the file
from Wordstar.
> However, all this is again *completely* independent and orthogonal to
> Unicode. Unformatted plain-text files would also be nice with just
> ASCII, and LF is as good a paragraph separator as Unicode's PS. I'd
> rather not use LS and PS at all on POSIX systems, because it would break
> a tremendous amount of software, even though I do appreciate that the
> clearly-defined LS/PS semantics does have its attractions and is much
> nicer in UCS-2 files than the historic CR/LF/NL mess.
Just using NL should work fine. As far as I know LF is just another name for
NL, it's the same character (hex 0x0A). A paragraph could be ended by an
empty line (in the file that's a double NL). We could even recommend this.
Perhaps we should add a note about this in appropriate places?
--
hundred-and-one symptoms of being an internet addict:
102. When filling out your driver's license application, you give
your IP address.
--/-/---- Bram Moolenaar ---- Bram@moolenaar.net ---- Bram@vim.org ---\-\--
\ \ www.vim.org/iccf www.moolenaar.net www.vim.org / /
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/