On Mon, Sep 04, 2006 at 11:44:26PM -0500, David Starner wrote:
> Once you compress the data with a decent compression scheme, you may
> as well store the data by writing out the full Unicode name (e.g.
> "LATIN CAPITAL LETTER OU"); the final result will be about the same
> size.
With some compression methods this is true, particularly bz2.
> Furthermore, you can fit a decent sized novel on a floppy
> uncompressed and a decent sized library on a DVD uncompressed.
Yet somehow the firefox source code is still 36 megs (bz2), and god
only knows how large OOO is. Imagine now if all the variable and
function names were written in Hindi or Thai... It would be an
interesting test to transliterate the Latin letters to Devanagari and
see how much the compressed tarball size goes up.
In all seriousness, though, unless you're dealing with image, music,
or movie files, text weighs in quite heavy in size.