Welcome to WebmasterWorld Guest from 54.146.211.105

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Announcing find spiders.pl

Wow.

     
11:12 pm on Feb 21, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2001
posts: 2548
votes: 0


I've been using a script I threw together a long time ago to get a quick view of what spiders are hitting my sites. It's nothing fancy and I kind of cringe when I look at some of the code, but it doesn't have any dependencies and the nice thing is that it does host lookups (once per ip) so you don't have to go looking up whether that IP was really from Google or not any more. It's a nice thing to run in cron; you can have it email you a report.

GitHub makes it so easy to put code out there that I figured why not, so here it is:

[github.com...]

NAME

find_spiders.pl - A script to find spiders from apache web logs and report on them.

DESCRIPTION

This is old code but has tended to work like a charm for me. You can put this in a cron and get daily emails about who's hammering your site. It's also helpful for forensic analysis after some jerk crawler takes your server down. One nice feature is that it does hostname lookups on the bots IPs (once-per-ip), so it's easier to tell if it's a bot that's actually from google or another legit search engine.

USAGE EXAMPLE

Analyze and report on the last 1000 lines of your domain's apache log.

./find_spiders.pl -f /home/domlogs/YOURDOMAIN.com -l 1000
5:24 am on Feb 22, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:698
votes: 0


Good man!

Good ol'perl!
5:51 am on Feb 22, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Got a few massive log files to try this out on and see if it finds anything I missed.

Thanks for sharing!
8:49 am on Feb 22, 2014 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6671
votes: 131


Thanks!
4:14 pm on Feb 23, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2001
posts: 2548
votes: 0


Thanks guys. If you're not a git user, here's the short course on how to download this from the command line:


git clone git@github.com:physicsdude/FindSpiders.git