[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character set tagging considered harmful
Thomas Wolff writes:
> Auto-detection applies to text, i.e. file contents, while I would
> assume LC_CTYPE to describe the environment that we're running in,
> especially the terminal mode. This doesn't need to be the same ...
Now where do you draw the line between file contents and environment?
In Unix, you are constantly writing temporary data to files, and piping
file contents through filters and pipes. In the end, you cannot
distinguish.
A system that has tried to distinguish was Windows (3.1, 95, 98): File
contents is normalized to ISO-8859-1, but I/O to consoles is CP437
encoded. It never worked. In order to work, every program, when writing
something to a file descriptor, would have had to check whether the
file descriptor is attached to a console, and if so, convert its data
from ISO-8859-1 to CP437 on the fly. Some programs do that. But it's
definitely not what we want.
All you can distinguish is "inside the computer" and "outside the computer".
For the former, we have LC_CTYPE. For the latter, we have character set
tagging in MIME and HTML.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/