[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ASCII and JIS X 0201 Roman - the backslash problem
Tomohiro KUBOTA writes:
> > 3) For programs that interpret backslash as some kind of escape character
> > and use Unicode internally but should work with text in Shift_JIS
> > encoding, consider the multibyte character 0x5C as being the escape
> > trigger, not [only] the Unicode character U+005C. This is already done
> > in bash and gettext. For example, in GNU gettext, we have the code
>
> I think interpretation of
> U+00A5 as an additional escape character doesn't always work, because
> Unicode texts don't have information on their origin (converted from
> Shift_JIS or not).
These are particular kinds of text files, which are fed to such
programs that do backslash interpretation: shell scripts, awk scripts,
gettext PO files, etc. - yes if the Yen sign should appear there it
needs to be doubled.
> If U+00A5 would always be an escape character,
> it would be harmful for many softwares.
Why is it more harmful if U+00A5 is an escape character than if U+005C
is an escape character? In both cases you just double it to get the
original character.
> I am interested in how European people succeeded to migrate from ISO 646
> variants into ISO 8859. Yen Sign Problem is exactly a problem of ISO 646,
> because "0x5c = YEN SIGN" comes from JIS X 0201 Roman, which is Japanese
> variant of ISO 646.
For me, the migration occurred when I switched to using a different
computer with a different OS and a different character set. (From
ISO646-DE to CP437 at that time.) Few files were transported - there
is usually a lot of text files that you can just drop once in three
years. Among the remaining ones the disambiguation was usually easy,
depending on the type of file: In letters I only used umlauts and no
brackets, whereas in programs I mostly used brackets and no umlauts.
Only few programs contained both brackets and umlauts, and I had to do
the fixup by hand, usually the next time I needed the particular
program.
So it is a minor annoyance over the time of a few months, but by far
not the costs that you are estimating.
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/