Well, I find comfort in that I don't seem to be the only one to interpret SOFT HYPHEN as I indicated. Though it's a long time since I used Framemaker, though not plain text, it had a soft hyphen that worked as I indicated (I don't know which code it uses for that). MS Word, though not plain text, has a soft hyphen that works that way too. The explanation in the Unicode book is in accordance with what I said as well: "U+00AD SOFT HYPHEN indicates a hyphenation point, where a line-break is preferred when a word is to be hyphenated. Depending on the script, the visible rendering of this character may differ (for example in some scripts it is rendered like a hyphen..." (page 6-5). And the Unicode Technical Report nr 14 (Line Breaking Properties; http://www.unicode.org/unicode/reports/tr14/tr14-5.pdf) also: "00AD SOFT HYPHEN (SHY) SHY is rendered invisibly and has no width, except at a line break. The rendering of the soft hyphen depends on the script. For the Latin script it is rendered as a hyphen, ...". Soft hyphens, I may add, are normally inserted by the text author to indicated preferred hyphenation points, and the soft hyphen should not be removed at wim by "paragraph reformatters".
Kind regards
/kent k
> -----Original Message-----
> From: Markus Kuhn [mailto:Markus.Kuhn@cl.cam.ac.uk]
> Sent: Monday, November 08, 1999 5:43 PM
> To: linux-utf8@nl.linux.org
> Subject: Re: Byte-order-marks considered harmful
>
>
> Karlsson Kent - keka wrote on 1999-11-08 15:41 UTC:
> > How do you propose to handle SOFT HYPHEN? It should be
> > zero-width and invisible unless a linebreak is done at that point
> > (in which case it should be rendered as a hyphen).
>
> You obviously misunderstood ISO 8859 here, which admittedly used
> extremely bad language in the relevant section. SOFT HYPHEN is a
> completely normal character, with a glyph identical to or almost
> identical to HYPHEN. A soft hyphen should be used whenever a hyphen is
> inserted to make a line break when a paragraph in a formatted
> text file
> is broken into lines. Soft hyphen characters plus the following white
> space obviously have to be removed by the paragraph
> formatting algorithm
> before reformatting the paragraph. Soft hyphens are NOT there to mark
> invisibly in words potential hyphenation points. Neither ISO 8859 nor
> ISO 10646 reserve an extra code point ZERO-WIDTH HYPHENATION POINT for
> this. Feel free to suggest adding such a code if you need one.
>
> If a soft hyphen appears anywhere else then at the end of a line, this
> is a probably due to a bug in the software that (re)formatted this
> character and forgot to remove the soft hyphens from words
> that are not
> broken any more after the formatting. There is absolutely no
> reason for
> a terminal emulator or other text display software to treat
> SOFT HYPHEN
> any differently from HYPHEN. The difference only matters to paragraph
> reformatting routines, but even there it has not been widely
> implemented.
>
> Markus
>
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
>
> -
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/lists/
>