[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode, character ambiguities
On Thursday 10 January 2002 04:04 pm, you wrote:
> On Thu, Jan 10, 2002 at 01:28:53PM -0800, Edward Cherlin wrote:
> > > Hmm. Looks like Unicode language tags are a much better
> > > solution.
> >
> > Unicode language tags are heavily deprecated. Language tagging is
> > markup, and there is no point pretending you have plain text when
> > you mark languages.
>
> Heavily deprecated? They were only added to the main body of the
> standard in Unicode 3.1, which isn't a year old.
http://groups.yahoo.com/group/unicode/message/3845
:From: Doug Ewell <dewell@xxxx>
:Date: Wed Sep 6, 2000 2:05 pm
:Subject: Re: Plane 14 redux
:
:Kenneth Whistler <kenw@xxxx> wrote:
...
:> Most
:> of us, including those of use culpable in the definition of the
:> tag characters (which John Cowan pointed out were defined to head
:> off a worse threat to UTF-8) would prefer not to see them in
:> wide use, but rather the use of standard tagging mechanisms like
:> XML or HTML.
:
:Wow. You too.
:
:I honestly had no idea that the use of Plane 14 language tags,
:defined as they are in a Unicode Technical Report, were so strongly
:deprecated by everyone "in the know" about Unicode, including their
:own creators. I had read UTF #7 at face value, as describing an
:optional mechanism that might help with certain processes but which
:we were under no obligation to use, but now it appears that Plane 14
:language tags have the RFC 1815 nature ("Here's something you can
:use, but for God's sake, please don't use it").
...
> > If you want tagging in plain text, use a standard. As far as I
> > can tell, the best available standard for such things is XML,
> > which defines Unicode as its preferred character set.
>
> The reason these characters *exist* is for specifying the language
> where a markup language like XML isn't an option. That's the case
> with Ogg tags.
I don't understand why markup is not an option.
> > I see no reason to encode language in Ogg tags. Users should be
> > able to choose a Unicode fontset that suits their needs for
> > displaying all languages.
>
> The entire discussion is about the ambiguities that prevent
> displaying a character in its native form without extra
> information. If your needs include "use font A for language A, and
> font B for language B", and languages A and B share codepoints, you
> need language tagging in some form; no fontset will be able to
> figure it out. Feel free to show that no people exist who want to
> do that.
Certainly some people want to. I'm arguing that they don't need to.
Anyway, give us an example. Either a message in one language that
cannot be displayed correctly from the plain text, or a message in
more than one language where rendering in the user's preferred font
loses information for that user.
--
Edward Cherlin
edward@xxxxxxxxxxxxxxxx
Does your Web site work?
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/