Forum Moderators: coopster
what I have now is each of the three bad cases
$badregex="/blogosword/";
$theifpat="/<a href/";
$worseregex="/javascript/";
and a single response to the bad cases
if ((preg_match("$badregex","$commentary" )) or (preg_match("$worseregex","$commentary")) or (preg_match("$theifpat","$commentary" )))
die ('#AO67FF: Radioactive ruby Gem error. Hazmat alert');
can I have all three cases in one pattern, and then have the pattern work.
I don't like it this way, as it makes things noticeably slower, I think
$pattern = "/pattern1¦pattern2¦pattern3/";
Just note that the vertical bar (¦) should be solid and not broken. The forum software breaks these when written.
Annoying that the spammers have found me before anyone else.
I know the spam mutates like fruit flies to get around solutions like this, And it does look like regex does slow things down a little. What do you recommend as good patterns to block, and what is the reasonable limit of expressions to look out for?
[webmasterworld.com...]
This thread goes over a few ideas and can be found in the PHP forum's library. If you do a search around the boards [google.com] you should find much more information about this that you can try to implement to see if it works for your case.
I read that link. It was really useful. I am ready for the bots when they march in and over the cliff of my hidden fields.
I hate dealing with captcha, so I don't want to go that route, even if I could at this time. And I am desperate for someone to actually look at the thing and comment, so I don't want to chase away the legitimate viewers.
Makes you wonder about the mentality of the people who do this. Why work so hard at being hated?
what is the reasonable limit of expressions to look out for?
First, you're sort of on the right track with this:
$theifpat="/<a href/";
$worseregex="/javascript/";
But what about
<a href=
[a href=
%5B a href
[url=
[link =
scri+pt
And on and on, ad nauseum?
What you have here is a never-ending chase of trying to plug the holes as they arise.
The short story: store your "bad patterns" in an array; look for them and exit immediately if found, but don't stop there. It's always an easier task to only accept what you want instead of contstantly trying to stop what you don't want. Filter for what you want to allow, and out of whatever's left, swap out any potentially dangerous characters for harmless equivalents.
My philosophy is to first understand "the enemy," to find out what motivates them:
Makes you wonder about the mentality of the people who do this. Why work so hard at being hated?
I have always said that a good form processor has to do one thing: log all raw input data. This will be different than what you get in server logs. It's easy. Open a file in a secure non-public location, dump the input, THEN go on to cleanse. Review it regularly. After a time, particular patterns (as in, actions, not regexps) begin to form. This will lead to an answer to this question, over time.
The #1 reason for spamming forms IME: link drops. And it must work, or they wouldn't do it.
So I approach this as follows. Note that this uses methods for PHP 4 compatibility which doesn't have the advantage of 5's filter_input_array().
1. LOG raw input first. This is where you sift through input for potential malicious data. If found, return "no email sent" (or similar) to browser. If the spammers log responses from their attacks - which they must most surely do, the people paying them wouldn't pay them if there weren't some indication of effectiveness - and this simple response is a clue that your attacks won't work here, move on.
2. Instead of storing bad patterns in variables, I suggest using an array. This way, when you go to add a new one, it only needs to be added in one place. In the following example, once the spammers figured out their spam resource was gone, they began spamming the form with "good site, admin" for no other reason than to annoy the crap out of them.
Note the liberal use of \s* (zero or more white space characters) in the regular expressions, this thwarts a lot of "workaround" bad patterns, such as 1 = 1 even though the standard sql injection attack is 1=1.
$bad_patterns = Array (
'\[\s*a\s*href.*\]*',
'\%5B\s*a\s*href.*(\%5B)*',
'\<\s*a\s*href.*\>*',
'\%3C\s*a\s*href.*(\%3E)*',
's\s*c\s*r\s*i\s*p\s*t',
// Some for sql injection
'drop\s+\w+',
'insert\s*into',
'\s*or\s*\d+\s*=*\s*\d*',
'\s*and\s*\d+\s*=*\s*\d*',
'update\s*',
'alter\s*',
// This list is actually 20 or so patterns, some
// removed to condense post
'good\s*site\s*,*\s*admin'
);
I realize filtering bad patterns is a contradiction of the "accept only what you want" philosophy, but since most of "what you want" can be abused to form malicious patterns, some chasing will always be required. The intent is to minimize the amount of pattern-chasing.
I trap these in the logging routine. If found, there is no reason to go any further, don't need to know anything else.
$spam_in=0;
foreach ($_POST as $key => $value) {
$input_content .= $key . ": " . $value . "\n";
foreach ($bad_patterns as $v) {
if (preg_match("/$v/i",$_POST[$key])) {
$trap .= "SPAM: $value found in " . $key . " field.\n";
$spam_in = 1;
}
$input_content .= "$key: $value\n";
}
}
// write $input_content to file here
// If $spam_in == 1, terminate with error message previously described.
3. Continue on with an aggressive cleansing, removing anything but what you want. Unless your form is an "add your url" form, there is no legitimate reason for a url or html of any kind in your public forms. Not one. Simply put: nothing but letters, numbers, and basic punctuation is allowed (caveat: there are legitimate uses for other characters, such as é, add these to this simplified example.)
The following is a bit loose, as I go on to remove @ from anything but an email field, exchange % for the word "percent":
$allowed = 'A-Z0-9"\'\%\.\,\$\@\!\(\)\=\-\_\&\;\s';
for each ($_POST as $key=>$value) {
$_POST[$key] = preg_replace("/[^$allowed]+/i",'',$value);
}
That is: remove anything that is NOT in my $allowed pattern. There are functions in PHP that do this for you, personally, I like to see what my coding is doing instead of feeding it to a "black box."
Yes it's not the PHP'ish approach. Yes it's "a little more work." But I can see more clearly what it's doing, which is why I do it this way.
Alter as you wish.