Welcome to WebmasterWorld Guest from 54.196.147.57

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Announcing find spiders.pl

Wow.

     

physics

11:12 pm on Feb 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been using a script I threw together a long time ago to get a quick view of what spiders are hitting my sites. It's nothing fancy and I kind of cringe when I look at some of the code, but it doesn't have any dependencies and the nice thing is that it does host lookups (once per ip) so you don't have to go looking up whether that IP was really from Google or not any more. It's a nice thing to run in cron; you can have it email you a report.

GitHub makes it so easy to put code out there that I figured why not, so here it is:

[github.com...]

NAME

find_spiders.pl - A script to find spiders from apache web logs and report on them.

DESCRIPTION

This is old code but has tended to work like a charm for me. You can put this in a cron and get daily emails about who's hammering your site. It's also helpful for forensic analysis after some jerk crawler takes your server down. One nice feature is that it does hostname lookups on the bots IPs (once-per-ip), so it's easier to tell if it's a bot that's actually from google or another legit search engine.

USAGE EXAMPLE

Analyze and report on the last 1000 lines of your domain's apache log.

./find_spiders.pl -f /home/domlogs/YOURDOMAIN.com -l 1000

Angonasec

5:24 am on Feb 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good man!

Good ol'perl!

incrediBILL

5:51 am on Feb 22, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Got a few massive log files to try this out on and see if it finds anything I missed.

Thanks for sharing!

keyplyr

8:49 am on Feb 22, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks!

physics

4:14 pm on Feb 23, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys. If you're not a git user, here's the short course on how to download this from the command line:


git clone git@github.com:physicsdude/FindSpiders.git
 

Featured Threads

Hot Threads This Week

Hot Threads This Month