homepage Welcome to WebmasterWorld Guest from 54.198.139.141
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Bots by Hosting Company
incrediBILL




msg:4475725
 12:40 am on Jul 15, 2012 (gmt 0)

Has anyone actually done a "Bots by Hosting Company" report to show what's crawling from The Planet, AWS, etc.?

I think it would be interesting to see how these things are clustered.

 

keyplyr




msg:4475737
 1:24 am on Jul 15, 2012 (gmt 0)


I agree. Let me know when you get it all compiled :)

dstiles




msg:4475892
 7:52 pm on Jul 15, 2012 (gmt 0)

Unless activity for any given IP reaches epidemic proportion I ignore (eg) ThePlanet, AWS etc - in fact, any server farm I discover. :)

I have large chunks of AWS in the IIS firewall so do not see stuff within those ranges. Other stuff seems to be spread pretty much across the world, although much of it from USA, Russia and Ukraine with a fairly high proportion from China. And, for some reason, "ain't I clever" UK idiots trying it on for no obvious motivation other than self-gratification.

rowan194




msg:4485019
 12:47 pm on Aug 15, 2012 (gmt 0)

I'm working on something like this right now, allowing you to associate user agents with IP ranges. I've been collecting data since 2009. I have one site which gets hit pretty hard with scrapers (as well as legitimate SE bots) so it's almost like a honeypot for finding new and obscure user-agents.

I'm hoping to work the data in such a way that it will be possible to determine, based on past empirical data of activity on various sites, whether a load from a particular IP range is likely to be a bot. As the OP hints at, you wouldn't expect many interactive browser sessions to be coming from AWS or ThePlanet IPs, unless the servers were proxying for humans... I'm going to be trying to determine that automatically.

Dijkgraaf




msg:4485252
 12:16 am on Aug 16, 2012 (gmt 0)

Rowan194, there is already a site that collect this sort of information [botsvsbrowsers.com...]
You can either search by user agent, or the IP address.

rowan194




msg:4485280
 2:51 am on Aug 16, 2012 (gmt 0)

Wouldn't be the first time I've reinvented the wheel.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved