[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Questions about Unicode-aware C programs under Linux
Hi Rich
Sorry. I managed to solve the problem. You were right.
Of course, there are only some minor problems regarding that string literals do not match exactly with those strings read from a file, thus string comparison functions fail to operate. I am going to investigate on it.
Thanks a lot
Best Regards
Ali
On 4/17/07, Ali Majdzadeh <ali.majdzadeh@xxxxxxxxx> wrote:
Hello Rich
Sorry, again.
I wrote a simple C program using your guidelines but unfortunately it does not work well:
The program is as follows:
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <langinfo.h>
int main (
int argc,
char *argv[]
)
{
FILE *input_file;
char buffer[1024];
if (!setlocale (LC_CTYPE, ""))
{
fprintf (stderr, "Locale not specified. Check LC_ALL, LC_CTYPE or LANG.\n");
return EXIT_FAILURE;
}
if (!(input_file = fopen ("./in.txt", "r")))
{
fprintf (stderr, "Could not open file : %s\n", strerror (errno));
return EXIT_FAILURE;
}
fgets (buffer, sizeof (buffer), input_file);
fprintf (stdout, "%s", buffer);
return EXIT_SUCCESS;
}
The program does not print the line read from the file to stdout (some junks are printed). I also used "cat ./persian.txt | iconv -t utf-8 >
in.txt" to produce a UTF-8 oriented file.
Best Regards
Ali
On 4/17/07,
Rich Felker <dalias@xxxxxxxxxx
> wrote:On Tue, Apr 17, 2007 at 10:46:44AM +0430, Ali Majdzadeh wrote:
> Hello Rich
> Thanks for your response.
> About your question, I should say "yes", I need some text processing
> capabilities.
OK.
> Do you mean that I should use common stdio functions? (like, fgets(), ...)
Yes, they'll work fine.
> And what about UTF-8 strings? Do you mean that these strings should be
> stored in common char*
Yes.
> variables? So, what about the character size defference (Unicode and ASCII)?
> And also, string functions? (like, strtok())
strtok, strsep, strchr, strrchr, strpbrk, strspn, and strcspn will all
work just fine on UTF-8 strings as long as the separator characters
you're looking for are ASCII.
strstr always works on UTF-8, and can be used in place of strchr to
search for single non-ascii characters or longer substrings.
Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive:
http://mail.nl.linux.org/linux-utf8/