[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wcwidth requires __STDC_ISO_10646__?
On Mon, 01 Apr 2002 15:52:06 +0100
Markus Kuhn <Markus.Kuhn@xxxxxxxxxxxx> wrote:
> Tomohiro KUBOTA wrote on 2002-04-01 13:34 UTC:
> > Michael B. Allen <mballen@xxxxxxxxx> wrote:
> > > Does wcwidth require __STDC_ISO_10646__?
> >
> > No, wcwidth() does not require __STDC_ISO_10646__ .
>
> In more detail:
>
<snip>
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
>
I see. I was looking at this code and assumed it's implementation was
standard practice. I knew nothing of it's history. This is good news. It
means my code is more portable than I had previously thought.
Taking all of this into consideration I think I have a new question. If
I want to count characters (rather than screen positions or bytes) I
must know how to define a character. For example, I have a function like:
/* Return a pointer to a substring of src at character position off not
* examining more than sn bytes of src.
*/
char *
mbsnoff(char *src, size_t sn, int off)
{
wchar_t ucs;
size_t n;
mbstate_t ps;
memset(&ps, 0, sizeof(ps));
if (sn > INT_MAX) {
sn = INT_MAX;
}
if (off < 0) {
off = INT_MAX;
}
while (sn > 0 && off > 0 &&
(n = mbrtowc(&ucs, src, sn, &ps)) != (size_t)-2) {
if (n == (size_t)-1) {
PMNO(errno);
return NULL;
}
sn -= n;
src += n ? n : 1;
if ((n == 0 || wcwidth(ucs) != 0) && --off == 0) {
break;
}
}
return src;
}
I want it to consider combining characters, CJK, or whatever else
properly. As it is, this code just skips zero width combining characters
and ignores wcwidth > 1 treating CJK as one character.
So my question is, will this "count characters" or is there a
simpler/better/official way?
Thanks,
Mike
--
May The Source be with you.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/