[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character set tagging considered harmful
: towo@computer.org wrote on 1999-09-21 13:08 UTC:
: > I think there is some confusion here. Auto-detection applies to text,
: > i.e. file contents, while I would assume LC_CTYPE to describe the
: > environment that we're running in, especially the terminal mode.
: > This doesn't need to be the same and if LC_CTYPE is used to define one
: > thing it should perhaps rather not be used to derive the other information
: > which is usually quite unrelated.
:
: I really think, they are the same, they were intended to be the same and
: in my opinion they really should be the same. I like
This resembles the previous discussion on "Do we need a heterogeneous
environment?" I agree "they should be the same" after a transitional
period of perhaps 10 years, when all old terminal equipment is broken
- on the other hand, given that, we may not need the tag anymore...:)
But even if they were intended to be the same, they are not today.
As a matter of fact, we (i.e. most of us) do have heterogeneous file
encodings and also heterogeneous terminal access (considering occasional
remote work).
: How far do you want to implement autodetection? Do you want "ls" to
: autodetect, whether a filename is in Latin-2, Latin-15, JIS X0208 or
: UTF-8 and convert automatically accordingly?
I cannot detect Latin-X from each other but UTF-8 as a multi-byte encoding
is quite different
: Character set
: autodetection, if it really became common-place under Unix, would mean
: that practically every application would have to be equipped with a
No, basically just editors and viewers.
: full-fledged any-to-any conversion package. Horrible prospect. No, I
: really really think that separating the plain-text and terminal encoding
: is a rather dangerous route, that I most certainly will not support in
Dangerous in that it may fail, yes, but that doesn't mean we don't need it.
Even if we cannot detect file encodings reliably, we still need a
reference mechanism to determine the terminal behaviour reliably!
: any way. All this also has nothing to do with UTF-8, which is just yet
: another encoding and should be treated just as such. The entire
: autodetection or tagging business sounds to me very much like
: reinventing ISO 2022 with all its consequences.
Bruno Haible wrote:
: Thomas Wolff writes:
:
: > Auto-detection applies to text, i.e. file contents, while I would
: > assume LC_CTYPE to describe the environment that we're running in,
: > especially the terminal mode. This doesn't need to be the same ...
:
: Now where do you draw the line between file contents and environment?
: In Unix, you are constantly writing temporary data to files, and piping
: file contents through filters and pipes. In the end, you cannot
: distinguish.
see above
: All you can distinguish is "inside the computer" and "outside the computer".
That's just what I meant, where terminal access is somewhat "outside" -
at least more "outside" than file contents.
: For the former, we have LC_CTYPE.
In that sense, it would be for the latter.
: For the latter, we have character set tagging in MIME and HTML.
Another level of outsideness, but well defined (despite of frequent errors).
What we need is a clear description of the other two levels.
Kind regards,
Thomas Wolff
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/