trinity-users@lists.pearsoncomputing.net

Message: previous - next
Month: March 2016

Re: [trinity-users] Hopeing I can find a regex expert here

From: Gene Heskett <gheskett@...>
Date: Wed, 23 Mar 2016 09:28:17 -0400
On Wednesday 23 March 2016 02:58:39 Michele Calgaro wrote:

> On 2016/03/23 02:19 PM, Gene Heskett wrote:
> > On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:
> >> On 2016/03/23 12:44 PM, Gene Heskett wrote:
> >>> Greetings;
> >>>
> >>> I use mailfilter as a prefilter in front of fetchmail to nuke some
> >>> spam while its still on the server.
> >>>
> >>> But its missing hits on what I suspect is the From: or
> >>> Return-Path: strings that have quotation marks in the string
> >>> because the string is being spec'd by being surrounded by "show
> >>> this name" bs.
> >>>
> >>> I've added the character < as part of the string its to search
> >>> for, so the search string now looks like
> >>> "From:.*<*\.unwanted-tld".  Does this stand that famous snow balls
> >>> chance in hell of working well with or without a quoted "some
> >>> funkity name" in front of the real url with the <> around it?
> >>>
> >>> I just love the lack of documentation on how this string
> >>> comparison stuff works as shown by the man pages for grep and
> >>> regex.  All sorts of control options are well covered, but
> >>> figureing out how to write a search expression must be one of the
> >>> worlds better guarded secrets.
> >>>
> >>> So if someone could show me, or give a url that actually has the
> >>> full docs, I'd be greatfull.
> >>>
> >>> Thanks.
> >>>
> >>> Cheers, Gene Heskett
> >>
> >> Hi Gene,
> >> "From:.*<*\.unwanted-tld" will match a string like this (I have put
> >> one section per line to be cleaer): From:
> >> whatever character
> >> 0 or more <
> >> .unwanted-tld
> >
> > I thought I wanted 1 only, but the way these lowlifes change
> > addresses and names hourly, they may remove the <> surrounding the
> > real source address and screw me up.  But the fact that they often
> > put dbl-qoutes around the throwaway part of the url, is I think
> > screwing me regardless.
> >
> > What we need is the ability to specify the quote character by the
> > first non-space character after the DENY =, which is currently a "^
> > or a <> which apparently inverts the logic.  So a typical line would
> > be
> >
> > DENY = "^From:.*<*\.bid"
> >
> > Substitute any of the new tld's for bid that gets obnoxious.  Like
> > xyz, or .pro, heck that new list is several dozen tld's.
> >
> > But AFAIK, we're stuck with the dblquote wrapper around the string
> > to match.  Grrrr.
> >
> >> It is greedy, so it will scan until the last < if there are more
> >> than one. Not sure if this is what you need or not. If you can post
> >> an example of what you need to match, I can workout another regex
> >> if required.
> >
> > Try this:
> >
> > "-Bed Bugs-" <-BedBugs-@...>
> >
> > with Return-Path.* or From.* in front of it.  Or does that - sign, 4
> > of them, need escaping with a \ ? IDK.
> >
> > Thanks Michelle.
> >
> >> Cheers
> >>   Michele
> >
> > I converted about 3 lines of the filterdata file that way, and I'm
> > now waiting for the next blast of spam to serve as test data. 
> > mailfilter is a picky twit, but that hasn't given it a tummy ache
> > either, so I am hopefull.
> >
> >> PS: by the way, the internet is full of excellent documentation
> >> about regex ;-) For example "http://www.regular-expressions.info/"
> >
> > Cheers, Gene Heskett
>
> Hi Gene,
> so if I understand correctly, you already had a set of rules like
> DENY = "^From:.*\.bid"  (bid stands for any tld of yuor choice)
> but it was missing some entries because of the "..." entry before the
> domain. So you put the < in the string as well.
> Right?
>
> Assuming so, it surprises me that the original version missed some
> entries, since the additional "..." field would have already been
> matched by the .* part of the pattern.
> I think there is a different reason for missing entries. Perhaps a
> black character before "From:"? Could it be? You could try this other
> version:
> DENY = "^\s*From:.*\.bid"  which ignores any separator before From:
> or
> DENY = "^\s*From:.*\.bid>" which also makes explicit that the tld is
> followed by a >.
>
I'll do that for the top 4 or 5 entries to see what effect it has.

> By the way, by "missing some entries" you mean that it is not
> filtering all the spam or that it is filtering some good emails as
> well?
>
Two consequitive spams ending in the desired hit, it nukes one and passes 
the other, so I was looking for what the diff was.
Kmail shows the raw message with a tap on the v key, and I can't see any 
trash characters in from of the From etc lines.

But I see in the logs, that I an nuking posts from a valuable 
contributor, Seems Dr. Klepp is coming in from a .biz address, so I'll 
have to remove that filter line, and my apologies Nick if I have seemed 
to have ignored you.  I'd druther have the spam than miss your helpfull 
msgs.

> Final note, your current modified version if no different from the
> original, since <* (0 or more <) is preceded by .* (any sequence of
> character). Perhaps you wanted to make <.*, but it would make no
> difference either, except for being morerestrictive (i.e. there must
> be a < somewhere before the forbidden tld).
>
> Cheers
>   Michele


Cheers, Gene Heskett
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page <http://geneslinuxbox.net:6309/gene>