homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

How to block Unwanted Traffic Generators?
Traffic jumped 10 fold in 2 months -- I need to tackle this :|

 10:51 am on Nov 22, 2010 (gmt 0)

I read some good posts, in particular the sticky: Quick primer on identifying bot activity [webmasterworld.com]...

My "useless" traffic has increased ten fold and needs to be reigned in... and I am wondering how to get rid of the bandwidth thieves.

I still use ASP and was wondering on how to logically / structurally tackle this problem. E.g where do I let any filters /loggers do their work? What to log?

On an IIS server I have global.asa, then 404 and 500 handling.

I saw some silly form posting attempts and just implemented encrypted time stamps, and a CSS-hidden field. There is separate logging of 500 and 404 events. The "logging" becomes a mess, being in different places, and at some stage I can't tell when what code kicks in. (programming is not my profession)

Is there a best practice design or approach to catch the non-sense?
1. check incoming IP against blocked IPs, if on deny
2. check if URL contained characters not used on website; e.g. percent sign in URL trigger "put on IP block list"
3. if form is posted with expired token > onto block list
4. if form has honey pot filled in > block list
5. ?

Reading that UAs change or can be anything, is there value of checking them? Why? What for?

FORWARD-X being captured, and reverse DNS look-ups being done, when? What triggers them? and what info to store about it?

Are these sensible questions to ask, or would the answer give too much away?

The more I read, the less I know, and the more daunting the task :(

And there are Anti-hot-linking mechanisms, etc. needing to be included as well?!

Any hint or pointers to literature appreciated, on how to build successful anti-bot anti-nonsense solutions.
Or what method in which order and at which location need to be employed to build such a solution?



 11:05 am on Nov 22, 2010 (gmt 0)

The simplest answer to your solution is: WHITELIST

You only allow in the bots you like, GOOGLE/YAHOO/BING and browsers that pass normal filtering.

Everything else bounces to the curb, for the most part.

It takes specialty scripts to catch and bounce the noise beyond this point, but you'll make great strides by whitelisting alone.


 2:36 pm on Nov 22, 2010 (gmt 0)

The following may help you explore the possibilities:
IIS Banned IP [webmasterworld.com]

I've very few references saved as related to IIS, however have two references which may prove useful:
1) WebKnight Filter
2) IIRF rewrite engine


 10:37 pm on Nov 22, 2010 (gmt 0)

I built a MySQL database of blocked IPs (mostly server farms) with an ASP script to run "before" every page of every site (a few dozen on a single server). It took a bit of building, even based on a previous one with text file IP blocking, and I still add a few new server farms daily. UA blocking code is fairly static now, though.

It's possible to do it without the MySQL or text file but even so there is a fair amount of management needed.

I have often bewailed the fact that, many years ago, I listened to a man who said, "MS IIS wil do that..." I should have stuck to Linux, within which it is easier to block things. On the other hand, if you can install rewrite software on IIS you stand a better chance of getting an off-the-shelf solution (I think the rewrite software is already there on Windows 7).


 12:44 pm on Nov 25, 2010 (gmt 0)

Thanks for the replies!

Bill, I like your stuff; read you blog post from a few years ago... can understand the frustration.
I have implemented form token, and resulting IP blocking, works! :)
... also ip block for malformed urls and query strings and file extensions I do not have -- works too :)

White List?!
Do I RegEx know UAs and let them pass, and block all else?

Also thinking of blocking more than say 3 HTTP 500 errors?!


 12:11 am on Nov 26, 2010 (gmt 0)

Hmm, after further thinking I realised that I have 'blocking' code all over the shop: in global.asa, the default page, the 404 and planning on 500... which led me to explore where to best intercept and check for 'unwanted' traffic / behaviour.

Looking at the information flow:
a) a request comes in for the server
b) it checks, whether it is a DIR or a file
c) if DIR, it checks is browsing is allowed, if so display DIR (or default page), else 404
d) if file exits process file, else 404
e) if query string or code does not compute > throw err 500 page

I hope this is how a web server handles it...

Is there one single point I can intercept any call to the web server, decide what type of traffic it is and then continue processing any 400, 500 or more so normal page serving?

I am running my sites on an IIS 6 server, mostly using show.asp?topic=n (with n being a table row index) URL structures.

I am assuming the single point exists; if not I can imagine to use some include that does the "checking/blocking" at the beginning of the default page, the 404 and 500?

Any pointers would be appreciated.


 1:20 am on Jan 3, 2011 (gmt 0)

Well, I can now state: I won the battle!

Thank you to all on WebMasterWorld with your wisdom and advice!

I have been able to reduce my traffic to 10%
Yes, 90% of it was FORM SPAM and image hotlinking!

Thanks again!

I have employed 16 methods:
The best are honey URL / hidden, and encrypted token.
Any trap will ban the IP
I have GeoIP-blocked RU,UA,CN,IL,LV after monitoring my measures for a fortnight and identifying these countries as the worst bad traffic generators.

Thanks again!


 7:44 am on Jan 3, 2011 (gmt 0)

Welcome to the Magnificent Obsession that is bot and botnet blocking:)


 7:32 am on Jan 19, 2011 (gmt 0)

Hi MaxGrenk, how did you do it please?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved