Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

Detecting Hard-hitting Bots via Live Stats

How to identify bots hitting your site



8:56 pm on Nov 7, 2005 (gmt 0)

10+ Year Member

We have an issue with bots hitting our site hard ad slowing down the site. What strategies do people use to quickly identify bots and block them?

We use a Microsoft ISA server / Windows IIS server config.

JAB Creations

4:58 am on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member jab_creations is a WebmasterWorld Top Contributor of All Time 10+ Year Member

You will want to let Google/MSN/Yahoo in of course and most search engines use those indexes instead of using their own bots.

You can use robots.txt to keep some/all bots from accessing certain files. For example, if all your images are in www.example.com/images then you could just deny access to bots via robots.txt if you have no interest in being found via Google images.

I use a version of awstats that I've personally modified to detect an extensive number of browsers and robots though by default it is half decent. So far this month...

Yahoo - 2080+242 - 17.60 MB
MSN - 1400+46 - 30.11 MB
Google - 520+13 - 4.79 MB
(+242 = 242 hits on robots.txt)

Those are the three biggies... MSN tends to be kind of a bandwidth whore so I would suggest finding out where MSN is crawling that may be costing more bandwidth then you desire. Here are links to the major three's bot pages...


You should of course know how to work with robots.txt...

If you wanted to play with awstats... (install is kinda hard though)

There are occasional bots that will do a moderately hard crawl (I'm not concerned about bandwidth right now thankfully) but fluxiate to the point where if one doesn't hit, another does. Here are the totals of unfamous bots that have hit my site the hardest so far this year (including the big three)...

Yahoo - 88310+9433 - 596.00 MB
MSN - 53551+1526 - 1.03 GB
Googlebot - 37079+647 - 262.85 MB
WISENutbot - 7525+88 - 65.23 MB
Kolinka - 6033+604 - 139.67 MB (Forum spider)
BecomeBot - 6341+269 - 45.87 MB (Google ties?)
Ichiro - 6070+20 - 166.37 MB (Japan)
Grub - 4283+17 - 75.18 MB
ConveraCrawler - 2885+20 - 33.32 MB
Ask Jeeves - 2303+424 - 85.03 MB
LmCrawler - 2155+30 - 16.86 MB
psbot - 2026+115 - 17.63 MB (pic search)
Texas A&M IRLbot - 1745+303 - 7.45 MB
Alexa - 1201+403 - 56.44 MB
Asterias - 875+3 - 27.11 MB (Singingfish Spider)
Accoona - 857+7 - 6.05 MB

The rest of the bots stay about 6mbs or less.

Keep in mind bots will hit my site years after they've roamed my site, and vice versa (so in effect you may have to detect unknown bots that I am not currently aware of).

I'm not sure but if the size of a file can be determined by a head request then it would (if I'm correct) make better sense to head files (specifically images) to reduce bandwidth.

I also block have saved bandwidth from spammers using various methods (very effective if you have a high level abuse) though I won't discuss those methods right now.

Anyway I hope this helps some...

- John


5:27 pm on Nov 8, 2005 (gmt 0)

10+ Year Member

Thanks for your detailed reply.

Does Awstats give me a report for the most active IP's making requests in the last 10 minutes? That's really the data that would be helpful in detecting bots.

Also, do they process the the log files on the fly?

JAB Creations

8:46 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member jab_creations is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Well I can see a list (1,000?) of the highest bandwidth ips. There are tons of options (dns etc). Someone from the school I attend burned 1.3 GBs of bandwidth this weekend (not really concerned but I still notice such things as I radar below the radar, bwhahahah).

You can also config awstats to allow update from the browser (or set a regular interval).


Featured Threads

Hot Threads This Week

Hot Threads This Month