Forum Moderators: coopster
I have a current situation with a series of bots hitting a form of mine. Currently at the rate of about 10 "posts" per minute. I've left this running as the form is of little significance and it seemed to me to be a good opportunity to use the problem to test solutions.
A "captcha" graphic system would probably fix the issue, but I'm really interested in that as a last line of defence rather than first line of defence measure. It seems to me that there are a lot of things which could be done to prevent basic bot activity before the captcha is even checked. I like multiple lines of defence.
Considerations so far:-
Cookies
Not a lot of good as a defence against bots of course, but as a first line of defence it would at least stop the majority of user problems (clicking submit multiple times for example).
Questions and considerations:-
Tracking by IP
Can be spoofed of course but it's an option. I don't think it would be a good idea to ban by IP, although that's something to consider - an auto insert into .htaccess of some kind perhaps?
Questions and considerations:-
Captcha Systems
There's some useful information on the Captcha Project website:-
[captcha.net...]
A challenge was set there for gimpy-r, which resulted in a team building a bot that could breach the system with 78% accuracy. Pretty good (and with a search you can find many other examples of succesful captcha hacking).
I found a number of PHP based Captcha scripts on Wikipedia:-
[en.wikipedia.org...]
Quite a number of them are open-source/GPL.
What are the current ones like, and is Captcha overall a system that requires constant updating to keep up with the hackers?
In terms of full-defence, and what I've so far considered above, that would give:-
1. Set a cookie
2. Track IP's for multiple hits
3. Implement a Captcha graphic
Is there anything else that I've missed?
Thanks,
TJ
>> What would be a good cookie lifetime for this - are we looking at seconds, or minutes or longer?
well if you are using this for reposting it depends on what type of form it is and how often a normal user would repost.
The other option you have with cookies is to set it on the page where the form is and check i9f it exists when they post. This would get rid of a lot of bots, at least the ones that don't accept cookies.
Tracking by IP
Good option, I would use some pruning and remove all except the ones that are obvious bots every 24 hours or something.
>> use of resources
well, again that depends on how important this form is and how much traffic you have. Depending on those two things you would have to figure out if it a good use of resources or if the form is even needed.
>> If you implement an auto-ban into the .htaccess sin bin, what is the liklihood of taking out innocents in the cross-fire?
there is always a line somewhere. Again you need to assess your site and decide which side of the line is right for you. Is the number 1 priority to stop bots, then who cares about the few innocents that get taken out. If highest percentage of users getting through is the thing then a few bots won't hurt you.
Captcha Systems
captcha works, it shouldn't really need updating very pften, though you can use different fonts and the like. I wouldn't worry too much about captcha breakers, it's higher end bots and it depends on your site and what motivation there is to get by your defenses.
If we are just talking spammers they are a lazy bunch, if you make it significantly difficult then they will probably go somewhere else.
Truly malicious people will get by whatever you do so you need to give yourself all opportunities to catch them before they do any damage.
Another option is to have javascript or php that will only display the submit button (or form action) when the neccessary fields have been used (i.e. been 'click'd' on, etc.)
WP Hashcash is Wordpress’ strongest antispam plugin, boasting 100% efficiency. Over the last 6 months, not a single automated spammer has been able to break through. It works by using client side javascript to compute a special value which is sent to the server for verification. Since robots don’t have javascript, it’s unlikely that they will ever send the correct hidden value.
It's transparent to the user. I'd imagine it wouldn't be too hard to modify to use your form.
Sean
24 hours on IP logging seems quite reasonable and would, I think, keep the DB size down significantly.
I'll pop back with some code based on these three lines of defence.
inveni0/Sean - many thanks, but I really want to avoid using any form of JS. I believe that there's plenty that can be done server-side.
TJ
I assume that most automated systems (bots) just call a raw URL rather than keep crawling a page in real-time?
If so, it would be quite easy to set up a CRON job to alter the name of the form script that's being called.
EG, consider (pseudo code):-
i = <a random integer>;
//Copy the script to a file with a new name
cp myscript<oldvalue>.php myscript<i>.php;
//Rebuild the HTML page with the script reference under the new name
$handle = fopen("myscript".i.".php", 'a');
$header = file_get_content("aboveURL.html");
$footer = file_get_content("belowURL.html");
fwrite($handle, $header);
fwrite("<form action=\"myscript".i.".php\">");
fwrite($handle, $footer);
//Remove the old script
rm myscript<oldvalue>.php;
If you were to run that once an hour it would rotate the form script URL, rendering the old one useless.
TJ
I wrote a bot a while back that enters me in contests (all I've won is one cookbook, but I digress). One of the things I did in it was let the submission take place over several pages, so that I could point it at the front page, let it crawl to the form so it picks up cookies and referrers. Turned out to be easier to do that than to have to configure it :)
Sean