[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strstr
On Thu, 4 Oct 2001, Glenn Maynard wrote:
> A big hit, but I wonder how much is avoidable. The three cases for this, I
> think, are: strstr (dumb, ends up comparing continuation bytes); strstr
> that knows utf8 (avoid comparing those bytes); or converting to UCS-2 or
> UCS-4 and doing a memcmp.
>
> I think skipping continuations would be a speed hit--you'd be taking
> the (minor) hit of UTF-8 decoding logic for every character, and all
> you're saving is a few byte compares. (Actually, a lot of byte
> compares, but it's a lot less code.)
Please substantiate any claims about performance by actually making a
realistic measurement, not a guess. Most such guesses are naive on modern
processor architectures, which typically are RAM bound for searches, not
CPU bound.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/