Forum Moderators: open

Message Too Old, No Replies

Blocking the bad boys

keeping bots, breakers and bugs out

         

superclown2

9:20 pm on Apr 16, 2008 (gmt 0)



The only visitors I want to my sites are genuine surfers from recognised ISPs and useful search engine spiders. I want to ban everything else because I can't see any reason why I should keep up a constant battle against the spoilers. Do lists of the good guys exist which I can put into my firewall scripts?

incrediBILL

8:42 am on Apr 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's a tall order.

You can whitelist, aka allow just Google, Yahoo & MSN, to keep out all unwanted bots that easily identify themselves. However, when it comes to everything else, there's a ton of bots that don't identify themselves that spoof browser user agents and it gets real complicated at that point.

You can block known web hosting data centers and lists of proxy IPs which stops a lot of noise but that won't stop bots operating from residential IPs.

FWIW, you can do a reasonable job but you can't stop everything.

keyplyr

9:13 am on Apr 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry to say that Bill is right on. Online businesses nowadays are a defensive endeavor. I check server logs each hour of each day and more often than not, there is something going on that I need to take action with.

Hobbs

11:55 am on Apr 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I nowadays manually look into my stats every 15 to 30 min 20 hours a day, and block hundreds of IPs daily.
I am yet to look into my bot trap or error logs without finding 2 to 3 IPs or IP ranges to block.
It's like looking at your pillow with a microscope, sometimes it's best not to look too close or you won't be sleeping at all.

As Bill said, blocking user agents and known hosting data centers can only get you this far, what we all need is a commercial package that intelligently analyzes traffic logs and headers and presents a captcha challenge to verify human activity, I'm still waiting for this package to come along!