[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: perl unicode support
???????? wrote:
>
> > That would be contradictory to the whole concept of Unicode. A
> > human-readable string should never be considered an array of bytes, it is an
> > array of characters!
>
> Hrm, that statement I think I would object to. For the overwhelming
> vast majority of programs, strings are simply arrays of bytes.
> (regardless of encoding) The only time source code needs to care about
> characters is when it has to layout or format them for display.
What about when it breaks a string into substrings at some delimiter,
say, using a regular expression? It has to break the underlying byte
string at a character boundary.
In fact, what about interpreting an underlying string of bytes as
as the right individual characters in that regular expression?
Any time a program uses the underlying byte string as a character
string other than simply a whole string (e.g., breaking it apart,
interpreting it), it needs to consider it at the character level,
not the byte level.
> When I write a basic little perl script that reads in lines from a
> file, does trivial string operations on them, then prints them back
> out, there should be absolutely no need for my code to make any
> special considerations for encoding.
It depends how trivial the operations are.
(Offhand, the only things I think would be safe are copying and
appending.)
Daniel
--
Daniel Barclay
dsb@xxxxxxxxx
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/