Forum Moderators: coopster

Message Too Old, No Replies

Methods and Procedures for Flood Control in Forms

How do you go about designing a system?

         

trillianjedi

12:20 pm on Feb 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm interested in having a discussion on how to put in place measures to prevent flood control of HTTP forms, from both bots and humans.

I have a current situation with a series of bots hitting a form of mine. Currently at the rate of about 10 "posts" per minute. I've left this running as the form is of little significance and it seemed to me to be a good opportunity to use the problem to test solutions.

A "captcha" graphic system would probably fix the issue, but I'm really interested in that as a last line of defence rather than first line of defence measure. It seems to me that there are a lot of things which could be done to prevent basic bot activity before the captcha is even checked. I like multiple lines of defence.

Considerations so far:-

Cookies

Not a lot of good as a defence against bots of course, but as a first line of defence it would at least stop the majority of user problems (clicking submit multiple times for example).

Questions and considerations:-

  • What would be a good cookie lifetime for this - are we looking at seconds, or minutes or longer?

Tracking by IP

Can be spoofed of course but it's an option. I don't think it would be a good idea to ban by IP, although that's something to consider - an auto insert into .htaccess of some kind perhaps?

Questions and considerations:-

  • If this is driven by database, it could get big pretty quickly.
  • Is some form of automated process, clearing out entries older than datetime X easy to implement and a good idea? In other words the database would be "revolving", keeping the entries down to a minimum.
  • Would the use of resources, say with a MySQL DB, be significant? Could the form script, and associated HTTP thread, which processes this somehow be made to run low priority?
  • If you implement an auto-ban into the .htaccess sin bin, what is the liklihood of taking out innocents in the cross-fire?

Captcha Systems

There's some useful information on the Captcha Project website:-

[captcha.net...]

A challenge was set there for gimpy-r, which resulted in a team building a bot that could breach the system with 78% accuracy. Pretty good (and with a search you can find many other examples of succesful captcha hacking).

I found a number of PHP based Captcha scripts on Wikipedia:-

[en.wikipedia.org...]

Quite a number of them are open-source/GPL.

What are the current ones like, and is Captcha overall a system that requires constant updating to keep up with the hackers?

In terms of full-defence, and what I've so far considered above, that would give:-

1. Set a cookie
2. Track IP's for multiple hits
3. Implement a Captcha graphic

Is there anything else that I've missed?

Thanks,

TJ

jatar_k

6:45 pm on Feb 24, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Cookies

>> What would be a good cookie lifetime for this - are we looking at seconds, or minutes or longer?

well if you are using this for reposting it depends on what type of form it is and how often a normal user would repost.

The other option you have with cookies is to set it on the page where the form is and check i9f it exists when they post. This would get rid of a lot of bots, at least the ones that don't accept cookies.

Tracking by IP

Good option, I would use some pruning and remove all except the ones that are obvious bots every 24 hours or something.

>> use of resources

well, again that depends on how important this form is and how much traffic you have. Depending on those two things you would have to figure out if it a good use of resources or if the form is even needed.

>> If you implement an auto-ban into the .htaccess sin bin, what is the liklihood of taking out innocents in the cross-fire?

there is always a line somewhere. Again you need to assess your site and decide which side of the line is right for you. Is the number 1 priority to stop bots, then who cares about the few innocents that get taken out. If highest percentage of users getting through is the thing then a few bots won't hurt you.

Captcha Systems

captcha works, it shouldn't really need updating very pften, though you can use different fonts and the like. I wouldn't worry too much about captcha breakers, it's higher end bots and it depends on your site and what motivation there is to get by your defenses.

If we are just talking spammers they are a lazy bunch, if you make it significantly difficult then they will probably go somewhere else.

Truly malicious people will get by whatever you do so you need to give yourself all opportunities to catch them before they do any damage.

inveni0

8:22 pm on Feb 24, 2006 (gmt 0)

10+ Year Member



If I'm not mistaken, there is code out there (javascript, I believe) that prevents bots from reading the entire web page. You might look into that.

Another option is to have javascript or php that will only display the submit button (or form action) when the neccessary fields have been used (i.e. been 'click'd' on, etc.)

SeanW

8:28 pm on Feb 24, 2006 (gmt 0)

10+ Year Member



There's a Word Press plugin called WP-HashCache that tries to solve this problem:


WP Hashcash is Wordpress’ strongest antispam plugin, boasting 100% efficiency. Over the last 6 months, not a single automated spammer has been able to break through. It works by using client side javascript to compute a special value which is sent to the server for verification. Since robots don’t have javascript, it’s unlikely that they will ever send the correct hidden value.

It's transparent to the user. I'd imagine it wouldn't be too hard to modify to use your form.

Sean

jatar_k

9:00 pm on Feb 24, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> boasting 100% efficiency

that doesn't take into account users that couldn't post. Anyone with js turned off is stopped.

this goes back to walking the line and whether you want 0 bots or false positives.

trillianjedi

8:36 am on Feb 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Adam.

24 hours on IP logging seems quite reasonable and would, I think, keep the DB size down significantly.

I'll pop back with some code based on these three lines of defence.

inveni0/Sean - many thanks, but I really want to avoid using any form of JS. I believe that there's plenty that can be done server-side.

TJ

SeanW

3:46 pm on Feb 25, 2006 (gmt 0)

10+ Year Member



OK, if you don't like the JS method... :)

I'd go with the logging of IPs. A simple DBM file should suffice if you don't want to use SQL. I'd stay away from .htaccess and just have the script figure it out.

Sean

trillianjedi

10:33 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a (possibly) interesting idea.

I assume that most automated systems (bots) just call a raw URL rather than keep crawling a page in real-time?

If so, it would be quite easy to set up a CRON job to alter the name of the form script that's being called.

EG, consider (pseudo code):-

i = <a random integer>;

//Copy the script to a file with a new name
cp myscript<oldvalue>.php myscript<i>.php;

//Rebuild the HTML page with the script reference under the new name
$handle = fopen("myscript".i.".php", 'a');
$header = file_get_content("aboveURL.html");
$footer = file_get_content("belowURL.html");

fwrite($handle, $header);
fwrite("<form action=\"myscript".i.".php\">");
fwrite($handle, $footer);

//Remove the old script
rm myscript<oldvalue>.php;

If you were to run that once an hour it would rotate the form script URL, rendering the old one useless.

TJ

SeanW

2:26 pm on Mar 6, 2006 (gmt 0)

10+ Year Member



You'd also want to change the link text that links to it.

I wrote a bot a while back that enters me in contests (all I've won is one cookbook, but I digress). One of the things I did in it was let the submission take place over several pages, so that I could point it at the front page, let it crawl to the form so it picks up cookies and referrers. Turned out to be easier to do that than to have to configure it :)

Sean

trillianjedi

2:36 pm on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK thanks Sean - so you're saying that sometimes bots do crawl properly rather than crawl a raw URL.

As you say, the link text could also be switched, although that complicates it a little - it still needs to make sense for humans. The link could always be a graphic though?