[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode, character ambiguities
On Thu, Jan 10, 2002 at 01:28:53PM -0800, Edward Cherlin wrote:
> > Hmm. Looks like Unicode language tags are a much better solution.
>
> Unicode language tags are heavily deprecated. Language tagging is
> markup, and there is no point pretending you have plain text when you
> mark languages.
Heavily deprecated? They were only added to the main body of the
standard in Unicode 3.1, which isn't a year old.
> If you want tagging in plain text, use a standard. As far as I can
> tell, the best available standard for such things is XML, which
> defines Unicode as its preferred character set.
The reason these characters *exist* is for specifying the language where
a markup language like XML isn't an option. That's the case with Ogg tags.
> I see no reason to encode language in Ogg tags. Users should be able
> to choose a Unicode fontset that suits their needs for displaying all
> languages.
The entire discussion is about the ambiguities that prevent displaying a
character in its native form without extra information. If your needs
include "use font A for language A, and font B for language B", and
languages A and B share codepoints, you need language tagging in some
form; no fontset will be able to figure it out. Feel free to show that
no people exist who want to do that.
--
Glenn Maynard
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/