[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to detect the encoding of a string?
* Simos Xenitellis [2005-06-02 19:16]:
> The ZIP format (http://www.info-zip.org/pub/infozip/doc/) appears not
> to specify the text encoding of the filenames of the compressed files,
> which causes a problem with unzip utilities when they try to
> uncompress .ZIP files that include filenames in non-UTF-8 encodings.
I encountered this problem recently, when I tried to unpack a zip file
with greek filenames created with WinZip. I didn't try any graphical
decompression software, only command-line unzip, and discovered that
while the filenames were stored in the zipfile in CodePage 737, unzip
tried to map them using a CP-437 to latin-1 translation table on
extraction, and the result was a complete mess...
I found that I could display the stored filenames correctly with the
following command:
zipnote file.zip | iconv -f cp737 -t utf-8
Then, I just renamed the extracted files by hand to the correct names.
There weren't so many, and I could see which file matched which
filename from the order in which they were extracted.
I looked through the unzip docs, but couldn't find an option to avoid
filename translation when unzipping.
By the way, I think that RAR understands filename encodings, because I
never had a problem opening .rar files with greek filenames created on
Windows.
--
Alexandros Diamantidis * adia@xxxxxxxxx
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/