[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

what shall we do about iconv?



With Bruno's current libiconv there seems to be no sensible way of
implementing a function that converts data while reading it from a
stream and knows at the end how may non-reversible conversions
occurred. This is something I want to be able to do in the mail client
Mutt.

I'm attaching below a program that I would like to work, but which
doesn't work because of deficiencies in the iconv specification and
implementation.

You can make the program almost work by changing it to read:

    if (r == -1 && errno != EINVAL && errno != E2BIG)
      fatal("iconv");
    if (ob == bufo)
      break;
    if (r != -1) /* trouble! */
      n += r;

The trouble is that you lose count of the number of non-reversibly
converted characters if either of these "errors" occurs. You could
avoid E2BIG from happening by making the output buffer MB_LEN_MAX
times bigger than the input buffer, but there is no easy way of
preventing EINVAL from happening without processing the data twice.

What should we do?

(1) Live with it. Either copy stream input into a temporary file,
convert it once to find out how long the output is, then malloc that
much memory and convert the file again, or do something complex and
fragile that involves detecting where the last complete character in
the input buffer ends by running a separate iconv (with a different
cd) on the buffer contents. Or maybe there are other work-arounds,
too. Any ideas?

(2) Specify and implement a better conversion utility. Perhaps we
could call it rconv and make it take a pointer to iconv_t so that it
would be restartable.

(3) Interpret the UNIX98 spec so as to make iconv usable. The spec is
ambiguous, and I think it is possible to interpret it in such a way
that the attached program works. It's not the most natural
interpretation, but it would be a lot more useful.

http://www.opengroup.org/onlinepubs/007908799/xsh/iconv.html

Does anyone have any other thoughts on this problem?

Edmund

PS. The program isn't very well tested, but I hope the idea's clear,
anyway.
#include <errno.h>
#include <iconv.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void fatal(char *s)
{
  fprintf(stderr, "%d %d\n", errno, E2BIG);
  perror(s);
  exit(1);
}

int main(int argc, char *argv[])
{
  iconv_t cd;
  char bufi[256], bufo[256];
  const char *ib, *t;
  char *ob;
  size_t ibl, obl;
  int r;
  int n = 0;

  if (argc != 3) {
    fprintf(stderr, "Usage: %s FROMCODE TOCODE\n", argv[0]);
    return 1;
  }

  cd = iconv_open(argv[2], argv[1]);
  if (cd == (iconv_t)-1)
    fatal("iconv_open");

  ibl = 0;
  for (;;) {

    /* Fill input buffer */
    for (; ibl < sizeof(bufi); ibl += r) {
      r = read(0, bufi + ibl, sizeof(bufi) - ibl);
      if (r == -1)
	fatal("read");
      if (!r)
	break;
    }

    /* Convert */
    ib = bufi;
    ob = bufo, obl = sizeof(bufo);
    r = iconv(cd, &ib, &ibl, &ob, &obl);
    if (r == -1)
      fatal("iconv");
    if (ob == bufo)
      break;
    n += r;

    /* Output */
    for (t = bufo; t < ob; t += r) {
      r = write(1, t, ob - t);
      if (r == -1)
	fatal("write");
    }

    /* Save unused input */
    memmove(bufi, ib, ibl);
  }

  fprintf(stderr, "Characters converted in a non-reversible way: %d\n", n);
  return 0;
}