On Wednesday 23 March 2016 07:22:03 E. Liddell wrote: > On Wed, 23 Mar 2016 15:58:39 +0900 > > Michele Calgaro <michele.calgaro@...> wrote: > > On 2016/03/23 02:19 PM, Gene Heskett wrote: > > > On Wednesday 23 March 2016 00:32:17 Michele Calgaro wrote: > > >> On 2016/03/23 12:44 PM, Gene Heskett wrote: > > >>> Greetings; > > >>> > > >>> I use mailfilter as a prefilter in front of fetchmail to nuke > > >>> some spam while its still on the server. > > >>> > > >>> But its missing hits on what I suspect is the From: or > > >>> Return-Path: strings that have quotation marks in the string > > >>> because the string is being spec'd by being surrounded by "show > > >>> this name" bs. > > >>> > > >>> I've added the character < as part of the string its to search > > >>> for, so the search string now looks like > > >>> "From:.*<*\.unwanted-tld". Does this stand that famous snow > > >>> balls chance in hell of working well with or without a quoted > > >>> "some funkity name" in front of the real url with the <> around > > >>> it? > > >>> > > >>> I just love the lack of documentation on how this string > > >>> comparison stuff works as shown by the man pages for grep and > > >>> regex. All sorts of control options are well covered, but > > >>> figureing out how to write a search expression must be one of > > >>> the worlds better guarded secrets. > > >>> > > >>> So if someone could show me, or give a url that actually has the > > >>> full docs, I'd be greatfull. > > >>> > > >>> Thanks. > > >>> > > >>> Cheers, Gene Heskett > > >> > > >> Hi Gene, > > >> "From:.*<*\.unwanted-tld" will match a string like this (I have > > >> put one section per line to be cleaer): From: > > >> whatever character > > >> 0 or more < > > >> .unwanted-tld > > > > > > I thought I wanted 1 only, but the way these lowlifes change > > > addresses and names hourly, they may remove the <> surrounding the > > > real source address and screw me up. But the fact that they often > > > put dbl-qoutes around the throwaway part of the url, is I think > > > screwing me regardless. > > > > > > What we need is the ability to specify the quote character by the > > > first non-space character after the DENY =, which is currently a > > > "^ or a <> which apparently inverts the logic. So a typical line > > > would be > > > > > > DENY = "^From:.*<*\.bid" > > > > > > Substitute any of the new tld's for bid that gets obnoxious. Like > > > xyz, or .pro, heck that new list is several dozen tld's. > > > > > > But AFAIK, we're stuck with the dblquote wrapper around the string > > > to match. Grrrr. > > > > > >> It is greedy, so it will scan until the last < if there are more > > >> than one. Not sure if this is what you need or not. If you can > > >> post an example of what you need to match, I can workout another > > >> regex if required. > > > > > > Try this: > > > > > > "-Bed Bugs-" <-BedBugs-@...> > > > > > > with Return-Path.* or From.* in front of it. Or does that - sign, > > > 4 of them, need escaping with a \ ? IDK. > > Hyphens should only need an escape if within a character class, > denoted by square brackets. > > > > I converted about 3 lines of the filterdata file that way, and I'm > > > now waiting for the next blast of spam to serve as test data. > > > mailfilter is a picky twit, but that hasn't given it a tummy ache > > > either, so I am hopefull. > > > > > >> PS: by the way, the internet is full of excellent documentation > > >> about regex ;-) For example > > >> "http://www.regular-expressions.info/" > > > > > > Cheers, Gene Heskett > > > > Hi Gene, > > so if I understand correctly, you already had a set of rules like > > DENY = "^From:.*\.bid" (bid stands for any tld of yuor choice) > > but it was missing some entries because of the "..." entry before > > the domain. So you put the < in the string as well. > > Right? > > > > Assuming so, it surprises me that the original version missed some > > entries, since the additional "..." field would have already been > > matched by the .* part of the pattern. > > I think there is a different reason for missing entries. Perhaps a > > black character before "From:"? Could it be? You could try this > > other version: > > DENY = "^\s*From:.*\.bid" which ignores any separator before From: > > That would also sweep up, say, fred@..., or > "I.bid" <ibid@...> > > > or > > DENY = "^\s*From:.*\.bid>" which also makes explicit that the tld is > > followed by a >. > > I'd cover the example as > > ^\W*((From:)|(Return-Path:)).*\.bid\W*$ > > which works out to zero or more non-word characters at the beginning > of the string, followed by "From:" or "Return-Path:" followed by zero > or more unknowns, followed by ".bid", followed by zero or more > non-word characters, followed by the end of the string. "Word" > characters are alphanumerics, some connectors like _-, and possibly > some non-ASCII depending on the implementation, so "non-word" covers > stuff like punctuation and whitespace. Marking the end of the string > makes it more likely you're getting the TLD and not some random bit in > the middle that was designed as a parser torture-test. > > If you want to get really silly, > > ^\W*((From:)|(Return-Path:)).*\.[^cCoOnN][a-zA-Z][a-zA-Z]+\W*$ > > ought to catch the majority of TLDs with a 3+ ASCII character > extension that isn't .com, .org, or .net, but without a larger sample > of "good" and "bad" addresses, I can't guarantee no false positives. > > I write a lot of regexes in my day job (which is not to say that I get > them right the first time, every time!) Assuming a Perl-compatible > implementation (which most of them are, more or less), "man perlre" is > a decent reference for the complicated bits. Just scroll past the > section on modifiers. > > E. Liddell Now that looks like the regex bible, Thanks a bunch. That needs printed and placed in the middle of the house little room. :) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > trinity-users-unsubscribe@... For additional > commands, e-mail: trinity-users-help@... Read > list messages on the web archive: > http://trinity-users.pearsoncomputing.net/ Please remember not to > top-post: > http://trinity.pearsoncomputing.net/mailing_lists/#top-posting Cheers, Gene Heskett -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene>