[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Who can make this script smaller?



Roberto Suarez Soto writes:
 > On Thu, 7 Jan 1999, William Chesters wrote:
 > 
 > > perl -0e '$_=<>;print"$&
 > > "while/<A[^<]*<\/A>/gsi'
 > 
 > 	Hmmm ... I don't understand it O:-)
 > 
 > 	Well, what did I miss, what am I wrong at? My Perl knowledge would
 > be quite improved if I knew that :-)

For a start may I apologise again for the fact that that script
doesn't actually do what Maarten's original did?

My repost

	perl -0e 'map{print"<LI>$_
	"}<>=~/<A.*?<\/A>/sig'

is more like it.

Consider first perl -0e '$_ = <>; while (/whatever/gsi) { print "$&\n" }'

 - perl -0ooo sets the input record separator (which is of course
   newline by default) to octal ooo; but if you don't give any digits
   then the separator is set to 000.  Then <> will slurp whole files
   at once, or the whole of stdin.

 - The regexp match operator /whatever/gsi works by default on $_ (now
   the whole of stdin).  The /i simply means "case insensitive"; the
   /s means "dot can match newline" and is actually unnecessary
   (oops).

 - The /g in a *scalar* context means: each time the operator is
   executed, find the next match (!).  Grotesque---it's implemented
   using a position pointer associated with every string---but useful.
   When there are no more matches it returns false, which is how the
   while loop works.

 - Whenever Perl's regexp engine finds a match in anything it puts the
   matched string into the variable $&.

That should explain that ...

As for the other one, it could be better written

	perl -0e 'grep { print "<LI>$_\n" } (<> =~ /whatever/sig }'

For a start we use =~ to do a regexp match operation directly on the
slurped stdin.

The other main point is that /g in a *list* context means: find all
the matches and put them in a list.  So

	<> =~ /whatever/sig

gives you (in a list context ...) a list of the occurrences of the
regexp `whatever' in the script's first input file, or stdin.

The grep is a bit of a red herring; in a scalar context (as here) 
grep { P } L is the number of times the code fragment { P } evaluted
to true when $_ was set to each element of the list L.  However I am
just using it to get the { print ... } fragment applied to each
member of <> =~ /whatever/sig.

(The equivalent in functional programming, from where this style
obviously derives, would generally be called `iter'.  In passing I
note that Perl's amazing terseness works perhaps even better on
functional-style code than on the BASIC which you so often see people
trying to write.)

Worse still, in the obfuscated version, I use `map' rather than `grep'
to get a similar effect because it is shorter.

Hope this helps,
William
-
European Universities' Linux User Groups -- Misc list
http://humbolt.geo.uu.nl/eulug/