[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
I18n, UTF-8, and Linux
Hello,
I am new to this list so please forgive
me I'm covering old ground.
I am interested in displaying some text
in languages other than English within my application. However, I'm
having some difficulty when trying to display non-ASCII characters. Note
that I use UTF-8 to display all characters, even those that can be represented
in 8 bits (0x00 - 0xFF).
For example, if I want to display the
character 'á' (that's an 'a' with an acute accent in case it doesn't show
up on your browser), that's U+00E1 in Unicode-speak. Encoding that
character as UTF-8, it comes out to be 0xC3 0xA1. If, in my .po file
(for the GNU gettext() utilities), I include the following:
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
...
#: TestProgram.cpp:145
msgid "it is"
msgstr "est\xC3\xA1"
what comes back is
est?
I know that the problem is not with
text rendering as I can write the UTF-8 directly into the string in the
program and it works fine, i.e. it displays the a with the accent.
Any ideas of what I might be doing wrong?
Note that I also tried typing the C3 and A1 characters directly (á)
but that also doesn't work.
ANOTHER PROBLEM: If I want to
display the word "mañana" for example, I would encode it as "ma\xC3\xB1ana".
However, the "\xB1a" is considered to be a single hex number!
How can I indicate that I want the byte \xB1 followed by the letter
'a'. Remember, I can't use formatting strings because I'm working
with gettext(). Surely somebody has run into this before!!
Thanks in advance.
Cheers,
Gil Glass
Telecom Field Services
JDSU
Germantown, MD, USA
+1-240-404-2551