[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Detection of UTF-8 characters in perl.
I may well be completely off list topic, and/or this has probably been
covered to death in here, but I thought i'd raise the subject again, unless
someone can point me to a search engine of the list archives :)
Essentially, i'm working on some iDNS stuff, and i'm looking for a nice easy
way to detect whether a string contains a utf8 character. I've looked
around, of course, and found a few things that seem to tell me it's not
reliably possible. this may or may not be outdated information :) i've used
the Convert::Scalar module to check whether the string is marked utf8, but
it doesn't seem to work on the variables i've passed from a cgi script. of
course, it seems that this is the grey area. quoting the Perl, Unicode and
i18N FAQ, "Without a signature you would need a moderate amount of text to
do a reliable detection. An example of an input source that is probably not
long enough would be a search widget on a web page."
whilst i'm on the subject, do you RACE encode the normal ascii characters in
a string that has both a utf8 extended character and normal ascii? i know
the utf8 map and the ascii map are the same for the first 127 characters,
and given what else i've seen, i assume you do, i just wanted to check with
some experts ;)
=====
"In a perfect world, we'd all lie blind and motionless in stacked coffins
filled with pudding. It would be dark and warm and nobody would have to
compete with anybody and also the government would pay for the pudding." -
Erik.
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/