Forum Moderators: phranque
I have heard that using a sum is a good idea, i.e. showing two simple numbers and asking the user to enter in the sum before it submits. That sounds simple enough, but I have a couple of questions I was hoping someone could answer...
1. Are spam bots smart enough to solve this issue?
2. Can the two numbers appear in the code of the page or does it have to be an image to be effective?
I am guessing simply listing the two numbers would be enough to stop most bots, but before I add this I thought I would ask.
This will stop all but the most persistent automated requests. It keeps your users from having to bother with awkward CAPTCHA problems too.
Output your form using the script itself, and set a cookie. When submitted, if the cookie does not match your value, drop it like a bad habit. Of all the ones I've tried, this one seems to work best and stops them dead.
Bots are not browsers. Only browsers can set and get cookies. :-)
The advantage this has over other methods is nothing is asked of the user - any time you ask the visitor to do anything at all, you risk losing them. It's just one less thing to worry about.
There are also a number of other things you want to do, mainly cleansing and screening your incoming data and logging all input from your web forms, that would stop most of them. Here's a discussion from last year, [webmasterworld.com] before I realized the cookie is the simplest resolution for me.
Output your form using the script itself,
I am not a programmer, can you offer me a little more explanation of what this means so I can communicate it effectively to my programmer.
When submitted, if the cookie does not match your value, drop it like a bad habit.
This as well, I want to be sure I communicate to him effectively what I need him to build. Also would your solution work if I am using a PHP script that processes the form and sends the email in a simple HTML email?
Bots are not browsers. Only browsers can set and get cookies. :-)
A cookie is not magic. It is just a string passed via the header information from and to a webserver. Assuming that a cookie works now to stop most of the spammers is no guarantee that it will work in a few years. There are bots on the net that pass cookies just as normal browsers already.
The other spam problem you won't catch with the cookie approach are the low-cost manual spammers from certain parts of the world who manually enter their spam messages. These users use normal browsers and have therefore the ability to use cookies just as normal visitors.
Manual spammers are not the real problem. Besides, by that argument they could just as easily solve a Captchka or add 2 plus six.
Of course your script/or scripts will always have to have other elements cleansing the data - most of them a bit more complex than a cookie and always seem to have a caveat or two that allows some to slip through.
I have battled this for years, and as a developer at an ISP had the opportunity to see it in all environments, on Linux and Windows servers, and always went after it the hard way, programming something server-side, trying to predict, trying to refine the input only to what is expected, banning IP addresses, picking the brains of the system admins for ideas, you name it - all of which works, but to varying degrees.
My real eye-opener came recently on a site with all of these in place except the cookie. The client went unspammed for years, then one day the spammers just showed up and started knocking. Since the script form was output by the script itself, as a temporary measure I just dropped in a cookie-set and validated it on submit. They stopped and never came back.
How do I know this worked? One of the things you always want to do is log all raw input from any forms. With other methods, I can "see" the spammers attempting to submit in my log, and I can also see that they were stopped. But they still try, and this causes load on the server.
When I used this cookie method, they just stopped completely. I know a lot of these "go away" for a while and come back, but it appears they haven't. The logs fell silent except for legitimate submits. Something about setting a cookie made it not worth their effort.
This works because most of the high volume spammers want to point Some Stupid Spam Script at your form and hammer it silly. It also works because nothing is asked of the user to make it work - which is really the most important thing. Additionally it doesn't present a denial of access by banning, which always poses the question of how many legitimate visitors you are turning away.
So someone figures out how to send and read a cookie via spam scripting. No big deal, if you output dynamically there are other ways you can validate an actual visit. As their methods change, so will mine. :-)
can you offer me a little more explanation of what this means so I can communicate it effectively to my programmer.
It doesn't matter what language you do this in. Anything but Javascript - I say that because you can set Javascript cookies but for various reasons this is really not much use.
You normally have a form in HTML that submits to the script. Instead, you put that form in the script so you can output the form and set a server-side cookie. See other advantage below
if ($some_variable_indicating_submit) {
&process_data;
}
else {
&print_out_form;
}
When preparing the form, make some random string based off the combination of the remote IP address, process ID, and time. Set this value in the cookie, store it in a database. When submitted, attempt to read the cookie, look it up in the DB. If it matches and the form data passes screening, send the mail, delete the entry. Or, these can be deleted via cron, just piggy-back it to other cron jobs.
You could be really lazy - err, economical - and skip storing the cookie value - just check for it's existence. :-)
There is no cure-all for form spamming. Cookies are just one more tool in your arsenal to slow it down significantly.
One last advantage to this approach: if someone disables Javascript and submits an empty required field, instead of some "required field" error and instructions to use the back button, you simply re-output the form with the submitted data intact:
<input type="text" name="fname" value="$fname">
Again - helping our customers is the prime objective. :-)
to:
cc:
bcc:
Content-Type
This isn't a full solution to form spam, but it at least stops your script from being used like an open relay.
Setting the cookies and making the field hidden using css does not work.As I mentioned before, on my previous attack they actually do manual hacking first. The spammer goes to the site and finds out what is being submitted and what is checked. So even if you implemented the cookies and hidden fields, the user can bypass it very easily. Some of the forms I use are 100% javascript and they are still able to hack it. Once the spammer figures out all the fields they use a sophisticated software to mimic a browser or they simply use a software testing tool where they record mouse movements, open up a real browser, and enter the values.
I was able to avoid this type of hack for quite some time because I have javacript calling another script and most bots won't be able to figure out if there is a form there in the first place.
The only real check is on the server side. Do a referrer check and data check. Most of them use [url= or href=" somewhere. This is one reason I don't rely on front-end javascript for validating the data. They can be bypassed easily.
Does the summing up two simple numbers stop a lot of this? If so, does it matter if the numbers are generated in the code of the page or must it be produced in a database?
I know nothing will probably stop the attacks 100% of the time, I am just looking to reduce the amount of attacks by a large amount.
Setting the cookies and making the field hidden using css does not work.
These techniques will certainly stop a large number of automated bots, and they're simple enough for the average webmaster to be able to implement. It's these random script-kiddie form fillers that are the most prevalent.
If the spammer is actually visiting your form and studying it for weaknesses then I agree that stronger and more complex methods of blocking are necessary. However, I'd argue that most sites' forms are not actually visited and studied in this way.
If there's an easy to implement server side spam prevention method, that won't tip the hat too much, that your average webmaster can implement without custom programming I'd love to hear about it.
Setting the cookies and making the field hidden using css does not work.
But . . . has he/she ever tried it? :-)
It really has worked, but I totally agree with the idea that this cannot be the only security measure you have to take. It could always break, like all the others. Screening the data server-side catches 99% of the attempts - but setting a cookie and validating it seemed to be the one that stopped them from even trying.
I'll add what I consider the most important tool in fighting this, log all raw data coming in from your forms. Not server logs, logs generated by your script that write raw data to a file somewhere. If you log it, you can see the patterns.
It is true that most of the sites won't be analyzed. However, if you have a commercial product, you will run into people who do tons of analysis to find vulnerabilities. Because they know that once they find one, they'll be able to spam hundreds, thousands of sites. These are the sites that need constant monitoring and apply patches.In your case, you are talking about common websites with some custom forms and you just want something simple that most people can just cut and paste. From what I gather from your sites there are only a few types of hacks. Most of them bypass your javascript check. As I mentioned below, most common text I see is the [url= and href= in the comment's field. We also check for referrer to make sure it's coming from your site; this method is not perfect because a bot can fake the referrer.
You can create some simple rules that you can embed in most of the scripts with little changes like the ones I've created for you so far. You can of course create one for each of the different languages like php and jsp as well.
Here's a quick example in asp
if (request.serverVariables("HTTP_REFERER") <> "(enter your referring page here)" then response.redirect "error.asp"if (instr(request.form("comments"), "[url=") > 0) then response.redirect "error.asp"if (instr(request.form("comments"), "href=") > 0) then response.redirect "error.asp"if (request.form("firstname") = request.form("lastname")) then response.redirect "error.asp" ' It is common for them to use the same values for different fieldsThe rules above are not written efficiently. But it is cut and paste friendly. If you find more bots getting through, we can put more rules in if they’re easily identified. If you really want more then you can save the user's IP in database or application variable and keep track of the submissions. You can throttle like only 1 submission per minute or if they submit more than 3 a CAPTCHA field will appear. That's much more difficult to implement but it works better.
Again, we've managed to minimize the impact these things have on our web forms which we have many. After all of these years battling at the page level, we're finally going to shift our focus to keeping them away from the forms to begin with. They won't ever see the forms if we can help it. :)
if (instr(request.form("comments"), "[url=") > 0) then response.redirect "error.asp"
Not picking on your programmer, but this is an example of trying to predict the bad input. What happens if I do this?
%5burl%3dhttp://gotcha.ru%5d
It will pass the input screen but render just fine in an email client as [url=...... Ordinarily this should get nixed in a metacharacter screening. Also remember these attacks are usually automated - this only points the screening to the comments field. They generally dump the same data in all form fields - subject, name, etc., everything except email.
Encrypt a page timestamp into a form variable (tick counts work best), and then when the form is submitted determine the time between page retreval and submission. Humans take a second or more to fill out a form, while spam bots almost never wait that long.
Customize the required delay for each form to the minimum time a person could possible fill it out (usually like 750 milliseconds to 1.5 seconds).
At minimum, you make the bot wait this long before submitted which ties up their resources.
We havent received one spam bot submission since doing this.
1 Page one simple form that has a hidden value
2 Page two is a review page hidden value is passed
In this page most of the spam program think they have passed the msg
in reality nothing has been passed just a REVIEW page
3 Page 3 checks for the hidden variable that was set in page 1
if it exists send the message from page 1 that was reviewed
in page 2
If the spammers go to page 2 there are no fields to enter
If the spammers go to page 3 they do not the hidden value passed
:)
If it's not clear I can give a better explanation
So far NO spam and no confirmation key to enter
Just the user MUST know it's step 1 of 2 and step 2 of 2 to enter a send
[edited by: phranque at 6:05 am (utc) on Mar. 14, 2008]
[edit reason] No urls, please. See TOS [webmasterworld.com] [/edit]
1. Save timestamp to hidden field, have the data encrypted (using custom encryption) otherwise smart bots can just change the timestamp.
2. Ask the user a simple question relating to your website -> keep it simple with a short answer. Check the answer is correct once the form is submitted. You could create a hash of many questions and answers for this.
There are contact form scripts out there with source code which do this kind of thing.
But do not make it to much of a hassle for the user or they may change their mind and not send the message. Also keep in mind that some users may not have JavaScript or Cookies enabled, so stay clear of requiring these.... and don't even talk to me about those image captchas :)