[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Project Gutenberg (slightly OT) (Was: Re: Unicode and man/groff/less problems)
On Fri, Mar 02, 2001 at 08:32:37PM +0000, Markus Kuhn wrote:
> David Starner wrote on 2001-03-02 19:20 UTC:
> > On Fri, Mar 02, 2001 at 01:55:50PM +0000, Markus Kuhn wrote:
> > > I was recently in contact with the initiator of project Gutenberg, and
> > > they are interested in updating their plaintext public domain literature
> > > format guidelines to UTF-8 and ISO 6429 SGR as soon as a few more
> > > editors to support entry comfortably are available.
> >
> > Ack! Why on earth would Project Gutenberg use ISO 6429? If you want
> > richtext, use HTML. It's easy to write, can be viewed on far more
> > platforms, and can be read as plain text without problems.
>
> Their philosophy is very much formatted mono-spaced plaintext oriented
>From the webpage, their philosophy is text for everyone that won't have
to be changed every time you change platforms. "We have had a
long-standing work ethic of providing our etexts in any medium people
wanted: Amiga, Apple, Atari . . .to IBM, to Mac, to TRS-80. . ." As far
as I know, ISO 6429 annotated UTF-8 can only be handled in an UTF-8
enabled xterm. HTML can be handled in on a wide variety of systems,
including the Windows systems that the average person uses.
> and they have discussed SGML frequently and decided that no.
SGML is a somewhat different concept from HTML. I can see why it would
get refused if it was presented as a complex DTD specially for Project
Gutenberg. That's different from a minimal HTML subset.
> Formatted
> plain text with SGRs can be turned trivially via
>
> perl -p -e 's/\x1b\[[0-\?]*m//g;'
>
> back into the formatted plain text that they use now.
"sed 's/perl -p -e/sed/g' ?"
Actually, no; that would lose the italics and stuff that currently is
preservered by various plain text tricks. In any case, lynx --dump or a
slightly larger (<< 100 lines) perl script could turn HTML into plain
text.
In any case, where are editors that can use ISO 6429 and UTF-8 going to
come from? I haven't seen any rush to support ISO 6429 in editors.
--
David Starner - dstarner98@xxxxxxxxxxxxx
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and
laughs at me. In fact, I'd be rather honored." - Joseph_Greg
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/