[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mk_wcwidth



On Mon, Jun 17, 2002 at 11:55:02PM -0400, Seer wrote:

> struct interval
> {
>   int first;
>   int last;
>   int width
> };
>
> static int bisearch(wchar_t ucs, const struct interval *table, int
> max)
> {
>   int min = 0;
>   int mid;
>
>   if (ucs < table[0].first || ucs > table[max].last)
>     while (max >= min)
>     {
>       mid = (min + max) / 2;
>       if (ucs > table[mid].last)
>         min = mid + 1;
>       else if (ucs < table[mid].first)
>         max = mid - 1;
>       else
>         return table[mid].width;
>     }
>
>   return -1;
> }

Um.  This'll return -1 unless ucs is out of range.  And if you're
interested in simplification, you'd reverse the logic:

   if (ucs is out of range)
      return -1;

   do_binary_search() { }
   
   return -1;

You'd also move the "mid" definition within the while() loop, since it's
only actually used there.

A tip: if you're posting completely untested code, say so.  (You left
out a semicolon above, too.)

> even better: using a btree instead of a bisearch; itd be
> a bit tougher to read the source: probably better to have it
> generated in that case. Ive attached an example of that style;
> it uses hand balanced trees to minimize the search to determine
> if a character belongs in a language. I think a perl script
> code generator would be needed do the tree-stringing in the width
> case, then the focus would be on tweaking the 
> (range/width/frequency) data.

(Err ... how in the nineteen hells is this "simplification"?)

If you really need a speedup for specific cases, it could work, but
it's actually a tradeoff; speed one up and slow down others.  (And
it's not an even trade: for every one you move up the tree, you move
two down.)  Except for ASCII, that kind of tradeoff isn't very useful
in general-purpose code.

And it might not be a speedup anyway: you're implicitely using a lot
more memory in a tree than a simple array, which is likely to slow the
whole thing down.

However, this is a generic implementation, probably written for
correctness, ease of testing and maintainability; not speed.  I doubt
Marcus would be interested in speed improvements if they're at the cost
of the rest.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/