Welcome to WebmasterWorld Guest from 107.20.75.63

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Bots by Hosting Company

     
12:40 am on Jul 15, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Has anyone actually done a "Bots by Hosting Company" report to show what's crawling from The Planet, AWS, etc.?

I think it would be interesting to see how these things are clustered.
1:24 am on July 15, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5805
votes: 64



I agree. Let me know when you get it all compiled :)
7:52 pm on July 15, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Unless activity for any given IP reaches epidemic proportion I ignore (eg) ThePlanet, AWS etc - in fact, any server farm I discover. :)

I have large chunks of AWS in the IIS firewall so do not see stuff within those ranges. Other stuff seems to be spread pretty much across the world, although much of it from USA, Russia and Ukraine with a fairly high proportion from China. And, for some reason, "ain't I clever" UK idiots trying it on for no obvious motivation other than self-gratification.
12:47 pm on Aug 15, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


I'm working on something like this right now, allowing you to associate user agents with IP ranges. I've been collecting data since 2009. I have one site which gets hit pretty hard with scrapers (as well as legitimate SE bots) so it's almost like a honeypot for finding new and obscure user-agents.

I'm hoping to work the data in such a way that it will be possible to determine, based on past empirical data of activity on various sites, whether a load from a particular IP range is likely to be a bot. As the OP hints at, you wouldn't expect many interactive browser sessions to be coming from AWS or ThePlanet IPs, unless the servers were proxying for humans... I'm going to be trying to determine that automatically.
12:16 am on Aug 16, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Rowan194, there is already a site that collect this sort of information [botsvsbrowsers.com...]
You can either search by user agent, or the IP address.
2:51 am on Aug 16, 2012 (gmt 0)

New User

5+ Year Member

joined:June 30, 2010
posts: 36
votes: 0


Wouldn't be the first time I've reinvented the wheel.