[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Who can make this script smaller?
Roberto Suarez Soto writes:
> On Thu, 7 Jan 1999, William Chesters wrote:
>
> > perl -0e '$_=<>;print"$&
> > "while/<A[^<]*<\/A>/gsi'
>
> Hmmm ... I don't understand it O:-)
>
> Well, what did I miss, what am I wrong at? My Perl knowledge would
> be quite improved if I knew that :-)
For a start may I apologise again for the fact that that script
doesn't actually do what Maarten's original did?
My repost
perl -0e 'map{print"<LI>$_
"}<>=~/<A.*?<\/A>/sig'
is more like it.
Consider first perl -0e '$_ = <>; while (/whatever/gsi) { print "$&\n" }'
- perl -0ooo sets the input record separator (which is of course
newline by default) to octal ooo; but if you don't give any digits
then the separator is set to 000. Then <> will slurp whole files
at once, or the whole of stdin.
- The regexp match operator /whatever/gsi works by default on $_ (now
the whole of stdin). The /i simply means "case insensitive"; the
/s means "dot can match newline" and is actually unnecessary
(oops).
- The /g in a *scalar* context means: each time the operator is
executed, find the next match (!). Grotesque---it's implemented
using a position pointer associated with every string---but useful.
When there are no more matches it returns false, which is how the
while loop works.
- Whenever Perl's regexp engine finds a match in anything it puts the
matched string into the variable $&.
That should explain that ...
As for the other one, it could be better written
perl -0e 'grep { print "<LI>$_\n" } (<> =~ /whatever/sig }'
For a start we use =~ to do a regexp match operation directly on the
slurped stdin.
The other main point is that /g in a *list* context means: find all
the matches and put them in a list. So
<> =~ /whatever/sig
gives you (in a list context ...) a list of the occurrences of the
regexp `whatever' in the script's first input file, or stdin.
The grep is a bit of a red herring; in a scalar context (as here)
grep { P } L is the number of times the code fragment { P } evaluted
to true when $_ was set to each element of the list L. However I am
just using it to get the { print ... } fragment applied to each
member of <> =~ /whatever/sig.
(The equivalent in functional programming, from where this style
obviously derives, would generally be called `iter'. In passing I
note that Perl's amazing terseness works perhaps even better on
functional-style code than on the BASIC which you so often see people
trying to write.)
Worse still, in the obfuscated version, I use `map' rather than `grep'
to get a similar effect because it is shorter.
Hope this helps,
William
-
European Universities' Linux User Groups -- Misc list
http://humbolt.geo.uu.nl/eulug/