trinity-users@lists.pearsoncomputing.net

Message: previous - next
Month: March 2016

Re: [trinity-users] Hopeing I can find a regex expert here

From: "Dr. Nikolaus Klepp" <office@...>
Date: Wed, 23 Mar 2016 12:49:29 +0100
BTW, have you tried "kregexpeditor" ?

Nik

Am Mittwoch, 23. März 2016 schrieb E. Liddell:
> On Wed, 23 Mar 2016 15:58:39 +0900
> Michele Calgaro <michele.calgaro@...> wrote:
> 
> > On 2016/03/23 02:19 PM, Gene Heskett wrote:
> > > On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote:
> > >   
> > >> On 2016/03/23 12:44 PM, Gene Heskett wrote:  
> > >>> Greetings;
> > >>>
> > >>> I use mailfilter as a prefilter in front of fetchmail to nuke some
> > >>> spam while its still on the server.
> > >>>
> > >>> But its missing hits on what I suspect is the From: or Return-Path:
> > >>> strings that have quotation marks in the string because the string
> > >>> is being spec'd by being surrounded by "show this name" bs.
> > >>>
> > >>> I've added the character < as part of the string its to search for,
> > >>> so the search string now looks like "From:.*<*\.unwanted-tld".  Does
> > >>> this stand that famous snow balls chance in hell of working well
> > >>> with or without a quoted "some funkity name" in front of the real
> > >>> url with the <> around it?
> > >>>
> > >>> I just love the lack of documentation on how this string comparison
> > >>> stuff works as shown by the man pages for grep and regex.  All sorts
> > >>> of control options are well covered, but figureing out how to write
> > >>> a search expression must be one of the worlds better guarded
> > >>> secrets.
> > >>>
> > >>> So if someone could show me, or give a url that actually has the
> > >>> full docs, I'd be greatfull.
> > >>>
> > >>> Thanks.
> > >>>
> > >>> Cheers, Gene Heskett  
> > >>
> > >> Hi Gene,
> > >> "From:.*<*\.unwanted-tld" will match a string like this (I have put
> > >> one section per line to be cleaer): From:
> > >> whatever character
> > >> 0 or more <
> > >> .unwanted-tld
> > >>  
> > > I thought I wanted 1 only, but the way these lowlifes change addresses 
> > > and names hourly, they may remove the <> surrounding the real source 
> > > address and screw me up.  But the fact that they often put dbl-qoutes 
> > > around the throwaway part of the url, is I think screwing me regardless.
> > > 
> > > What we need is the ability to specify the quote character by the first 
> > > non-space character after the DENY =, which is currently a "^ or a <> 
> > > which apparently inverts the logic.  So a typical line would be
> > > 
> > > DENY = "^From:.*<*\.bid"
> > > 
> > > Substitute any of the new tld's for bid that gets obnoxious.  Like xyz, 
> > > or .pro, heck that new list is several dozen tld's.
> > > 
> > > But AFAIK, we're stuck with the dblquote wrapper around the string to 
> > > match.  Grrrr.
> > >   
> > >> It is greedy, so it will scan until the last < if there are more than
> > >> one. Not sure if this is what you need or not. If you can post an
> > >> example of what you need to match, I can workout another regex if
> > >> required.
> > >>  
> > > Try this:
> > >  
> > > "-Bed Bugs-" <-BedBugs-@...>
> > > 
> > > with Return-Path.* or From.* in front of it.  Or does that - sign, 4 of 
> > > them, need escaping with a \ ? IDK.
> 
> Hyphens should only need an escape if within a character class, denoted by
> square brackets.
> 
> > > I converted about 3 lines of the filterdata file that way, and I'm now 
> > > waiting for the next blast of spam to serve as test data.  mailfilter is 
> > > a picky twit, but that hasn't given it a tummy ache either, so I am 
> > > hopefull.
> > >   
> > >> PS: by the way, the internet is full of excellent documentation about
> > >> regex ;-) For example "http://www.regular-expressions.info/"  
> > > 
> > > 
> > > Cheers, Gene Heskett
> > >   
> > Hi Gene,
> > so if I understand correctly, you already had a set of rules like
> > DENY = "^From:.*\.bid"  (bid stands for any tld of yuor choice)
> > but it was missing some entries because of the "..." entry before the domain.
> > So you put the < in the string as well.
> > Right?
> > 
> > Assuming so, it surprises me that the original version missed some entries, since the additional "..." field would have
> > already been matched by the .* part of the pattern.
> > I think there is a different reason for missing entries. Perhaps a black character before "From:"? Could it be?
> > You could try this other version:
> > DENY = "^\s*From:.*\.bid"  which ignores any separator before From:
> 
> That would also sweep up, say, fred@..., or
> "I.bid" <ibid@...>
> 
> > or
> > DENY = "^\s*From:.*\.bid>" which also makes explicit that the tld is followed by a >.
> 
> I'd cover the example as 
> 
> ^\W*((From:)|(Return-Path:)).*\.bid\W*$
> 
> which works out to zero or more non-word characters  at the beginning of the string,
> followed by "From:" or "Return-Path:" followed by zero or more unknowns, followed 
> by ".bid", followed by zero or more non-word characters, followed by the end of the 
> string.  "Word" characters are alphanumerics, some connectors like _-, and possibly 
> some non-ASCII depending on the implementation, so "non-word" covers stuff like
> punctuation and whitespace.  Marking the end of the string makes it more likely you're 
> getting the TLD and not some random bit in the middle that was designed as a parser 
> torture-test.
> 
> If you want to get really silly,
> 
> ^\W*((From:)|(Return-Path:)).*\.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$
> 
> ought to catch the majority of TLDs with a 3+ ASCII character extension
> that isn't .com, .org, or .net, but without a larger sample of "good" and
> "bad" addresses, I can't guarantee no false positives.
> 
> I write a lot of regexes in my day job (which is not to say that I get them right the
> first time, every time!)  Assuming a Perl-compatible implementation (which most 
> of them are, more or less), "man perlre" is a decent reference for the complicated 
> bits.  Just scroll past the section on modifiers.
> 
> E. Liddell
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: trinity-users-unsubscribe@...
> For additional commands, e-mail: trinity-users-help@...
> Read list messages on the web archive: http://trinity-users.pearsoncomputing.net/
> Please remember not to top-post: http://trinity.pearsoncomputing.net/mailing_lists/#top-posting
> 
> 



-- 
Please do not email me anything that you are not comfortable also sharing with the NSA.