homepage Welcome to WebmasterWorld Guest from 107.21.135.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
The Battle Against Form Spam - Thinking Aloud
form spam
beavis




msg:3234539
 3:12 am on Jan 28, 2007 (gmt 0)

Am I thinking clearly here? Any suggestions in this battle?

My web form submission process is as follows:

1. User enters data into fields (name, e-mail, phone, alt phone, address, comments, etc.)
2. User presses submit
3. Data is posted to a commercial form processing script (http://www.mydomain.com/formscript.asp?form=#*$!)
4. Form processing script performs server side validation of fields.
5. Form processing script saves submitted fields to database.
6. Form processing script calls e-mail component to send form field data to client.
7. Confirmation page is displayed to user.

Recently, I have noticed hundreds of spam entries in my database. The name, e-mail, and phone number fields (required for the form to validate) contain bogus data that validates. The comments section contains a link to an online pharmacy that I traced to the Ukraine.

I am not sure of the entry point this spam bot is using. While it could be loading my page, automatically filling in the form, then submitting, after reading WebmasterWorld, I believe the bot is more likely directly calling the script from a remote server.

The purpose, of course, is to send spam e-mail for the pharmacy website. But, I am a little puzzled on exactly how this is being accomplished. When I check the name, e-mail, alt phone, address, comment and other fields that have been saved, I do not find the expected long list of e-mail addresses. Likely, there is some way that the bot is passing the script a list of e-mail addresses outside of the fields that are saved in the database. If anyone knows how this is possible, please explain it to me. From my reading, I have learned that the bot could somehow call the script and pass on a huge list of “BCC” e-mail addresses to send the spam message to by utilizing virtually any field.

The other less likely possibility it that the bot is not sneaking a big list of e-mail addresses into the script, but just hoping to advertise the pharmacy to the form recipient.

Because of the way my forms are processed, my end users are not actually receiving these spam submissions. This is the one bit of good news. I could perhaps ignore this whole problem were it not for the annoyance of seeing the bogus data in my otherwise clean database of submissions and also out of fear that e-mail from my domain could be blacklisted. However, I don’t really know whether there is a way to implicate my domain or even my host’s mail server in the scheme.

As far as battling the spam, I have the option of upgrading to a new version of my form processor that uses captcha, but I am weary of using this because I fear that the bot may simply learn to call the script at a point past the captcha verification. Another common solution that I have read about is to hide an input field with CSS and try to entice the bot to fill it in. Then, if the field contains data, the submission is spam. Unfortunately, my form processor doesn’t contain this functionality and since the bot has already figured out how to successfully call the script, I don’t think the method would work, anyways.

What I may try to do is modify the e-mail component to look for requests to send e-mail to multiple users and just terminate if this happens. While this would not rid me of the annoyance of seeing the bogus entries in my form submission database, at least it would defeat the spammer’s primary purpose.

If anyone knows of threads that discuss how I could configure my e-mail component to snoop for spam attempts and abort if they are found, I would appreciate the links. I am using the CDOSYS component.

Also, any other ideas would be appreciated.

 

cameraman




msg:3234646
 8:01 am on Jan 28, 2007 (gmt 0)

One thing you can do to insure that the 'user' is following your submission process from the beginning is to put a token as a hidden field in the form. The token can be generated in a variety of ways from various bits of data. You want the token to be dynamic and moderately unique, by for example combining part of the time-of-day with the user's IP address. Keep in mind that some users' IP addresses can change during the normal course of a transaction, so don't base any assumptions on a 'static' IP address. You want the formulation to be complex enough that someone on the other side won't be able to figure out your algorithm easily.

Ideally you would compute the token at the beginning and store it in a database. Set the token as a hidden field, then when you're processing the form you compare the submitted token to the one you stored. If they match, then this submitter has been on board from the beginning.

Alternatively you could compute the token, set the hidden field, mangle it a little more and store it additionally as a session variable. Then when you get the form back, take the submitted field, mangle it the same way and see if it matches the session variable.

Lastly you could compute the token, set the hidden field, then set hidden fields containing the original data you need to recompute the token. If you're using a combination of time-of-day and IP, for example, you'd set hidden fields for the token, the time of day, and the user's IP address. When you get the form back, you use the original IP and time of day to recompute the token. If it matches..

So there's the theory. It's been too long since I've done anything in ASP so the following is in 'pseudo-code', that is, not any particular language.
Given:
Time/date = 11:59:15 25 Jan 2007
userIP = 12.34.56.78

mangletime = (hour + minute + seconds) * date / month ((11 + 59 + 15) * 25 / 1) = 2125
mangleIP = sum-of-components (12 + 34 + 56 + 78) = 180
token = mangletime * mangleip * 3.14 (2125 * 180 * 3.14) = 1201050

So tack on to the form:
<input type="hidden" name="token" value="1201050" />
<input type="hidden" name="submittedtime" value="11:59:15 25 Jan 2007" />
<input type="hidden" name="submittedip" value="12.34.56.78" />

Then when you process the form, run through the same process using submittedtime & submittedip and if the token matches, a spoof is highly unlikely.

Is the generated number absolutely unique? No, the same number will be generated if the ip is 34.12.56.78 or the time is 11:15:59. Does it matter? No, we just need one that's not going to wind up the same for everyone every time they visit and is hard to figure out how to mangle.

rocknbil




msg:3234716
 11:33 am on Jan 28, 2007 (gmt 0)

There is a post here that brought up a deceptively simple way to nix these attacks. Simply put an EMPTY hidden field in your form. The field is to be submitted blank. If there's data in it - poof. Bots will populate it.

They are doing this with automated programs that only visit your form to collect the form field names and the post URL. From then on it's a direct command-line request to the form processor.

Unless you have specific reasons for alowing any HTML - if it's in the submission, stop the process. Same is true of [forum] [style] [links]. Also bcc and multipart-form/data have no place in form input.

Always log any submitted form data. This reveals so much more than your server logs ever will.

BeeDeeDubbleU




msg:3234830
 3:13 pm on Jan 28, 2007 (gmt 0)

Beavis, I was seeing a similar problem and I added a question to my forms that could only be answered by a real person. For example, "What colour is the sky?" You can then offer a drop down with several options, one of which is "Blue". There are several variations of this that you can use.

This is not exactly rocket science and the spammers may eventually catch up with it but since I first did it about six months ago all the spam has stopped.

AmericanBulldog




msg:3235042
 8:14 pm on Jan 28, 2007 (gmt 0)

Is there anything that works with formamil to thwart the bots?

The empty field is easy to add, but I do not see a way to validate it in formmail

beavis




msg:3235315
 1:33 am on Jan 29, 2007 (gmt 0)

Cameraman,

Thanks for your suggestion. I don't completely understand your idea, but my consultant does and we may implement it.

Unfortunately, the other suggestions... hidden field, blue sky, etc. would only stop future attacks. The current bot has already figured out how to call my form processing script directly, thus bypassing these approaches.

rocknbil




msg:3235606
 11:07 am on Jan 29, 2007 (gmt 0)

beavis - so for this one time, you change the field names in your form and change the ones you process in your script. email becomes eMail, name becomes aName, like that. The bot will try to send the old values but your script shouldn't process it. Normally the bot will be back the next day to get the new values in your form, but this time they'll populate one they are not supposed to.

The empty field is easy to add, but I do not see a way to validate it in formmail

Probably not without altering the script to check for a field that is supposed to be blank, most scripts check for fields that are blank that aren't supposed to be. But it's really easy if you can open the script and edit it.

if ($data{'dummy_hidden_field'} ne '') { &some_error_output; }

'ne' being the perl logical 'not equal to' operator, in other languages it's!= or <>.

If you add this to your php or perl or asp script, they can try but nothing will go through. Well, until they figure out they are supposed to leave that empty.

onlineleben




msg:3236458
 10:09 pm on Jan 29, 2007 (gmt 0)

had a similar problem although the script checked against a list of forbidden words. updating the list with the most recent keywords from spam mails received, the attacks to my form are nearly gone.
Every one that still makes it into my DB gets screened for new words to add to the list.

frances




msg:3236665
 1:50 am on Jan 30, 2007 (gmt 0)

I also dont understand how they do it.

My asp script checks if bcc and multipartmime fields exist in the response, and blocks any post that has them. The spammers somehow get through even though they are using both.

If they can do that, they could probably also override some of the ideas in this thread.

Does anyone know how they are doing this?

rocknbil




msg:3237681
 9:42 pm on Jan 30, 2007 (gmt 0)

Do you also check for encoded versions of those patterns, and attempts to munge your pattern match with spaces, etc., and case-insensitivity?

beavis




msg:3237884
 2:58 am on Jan 31, 2007 (gmt 0)

Another question,

How likely is it that an actual person is visiting my site to set up the spam bot to fill out the right forms? Is this entirely automated or is a person visiting my site? Of course, nobody can tell for sure, just a guess.

rocknbil




msg:3238741
 8:35 pm on Jan 31, 2007 (gmt 0)

beavis - yes, you can tell for sure, by logging all input from your forms.

In my previous position at an ISP - I watched this stuff over a period of time and when we started getting spam bots, I began logging all data input. All means ALL. If it's a publicly viewable form - LOG IT, and check the log often. It doesn't have to be an elaborate scheme, just write input to a file. This is so much more valuable than your server logs on this issue.

Basically just parse your data, print the date, time, remote address, and all the key/value pairs to your file BEFORE cleansing the input. This reveals some remarkable things.

When you catch a new-coming bot, it will make several queries within a few short minutes, throwing various data to all the fields. What it's doing is submitting the data, reading the response. Once it collects the fields for the processor at this URL, it will go away for awhile, presumably to make you think it's gone away for good, but shortly thereafter it will return and start hammering your script with the fields and populating them with the links.

This is an automated process directly to your script processor. This is why the method in the last post will work. It will read your response, then re-read the originating form page to get the form fields, and try again - submitting data where it should be blank.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved