[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Powerful regular expression.



Hmmpf,

I made a mistake ... the counting matches lines from the SAME mail.
The following is more accurate:

~/Mail/archive>cat spam* | awk 'BEGIN { header=0; body=0; count=0 } /^From:/ { if (header==0) { header=1; body=0; count=count+1} } /^$/ { if (header==1) { header=0; body=1; } } /remove[0-9-]*@|[tT]o be removed|chain letter|absolutely [Ff][Rr][Ee][Ee]|entrepreneur|Dear (Sir|Friend|Windows|Online|eBusiness)|1-[89]00|Visa/ { if (body==1) { body=0; matched=matched+1; } } END { printf("%d out of %d\n", matched, count); }'
1019 out of 2543

In other words, 2 out 5 only :/

I'll have to study patterns with the word 'money' in it I think... adding
'money' gives 1353 out of 2543.  But that also results in a lot of
false positives (66 out of 2086).

-- 
Carlo Wood <carlo@alinoe.com>
-
Spamfilter:    spam magnet and regexp collector / blocker
Archive:       http://mail.nl.linux.org/spamfilter/
Website:       http://spamfilter.nl.linux.org/