homepage Welcome to WebmasterWorld Guest from 54.167.41.199
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
w3who.net's hidden crawler, crouching user agent
incrediBILL




msg:4383970
 1:57 am on Nov 6, 2011 (gmt 0)

Who is w3who.net? Another domain intel site.

Appears to maybe use 3rd party data with some direct updates of it's own.

IP: 178.79.129.38 (host and crawler)

USER AGENT: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.1 (KHTML, like Gecko) Chrome/6.0.428.0 Safari/534.1

Definitely trying to fly under the radar, too bad.

HOST: linode.com

IP RANGE: 178.79.128.0 - 178.79.135.255

 

keyplyr




msg:4384000
 6:24 am on Nov 6, 2011 (gmt 0)

These are the Linode ranges I block. Ever since they started their Cloud Servers there's been increasing trouble from every one of these ranges.

69.164.192.0 - 69.164.223.255
69.164.192.0/19

74.207.224.0 - 74.207.255.255
74.207.224.0/19

97.107.128.0 - 97.107.143.255
97.107.128.0/20

173.255.192.0 - 173.255.255.255
173.255.192.0/18

178.79.128.0 - 178.79.135.255
178.79.128.0/21

incrediBILL




msg:4384002
 6:42 am on Nov 6, 2011 (gmt 0)

What we need is a nice big list of IPs and IP range lists to pin to the top of the forum for everyone to share.

Problem is I can't use hard firewall blocks otherwise I'd never figure out where these crawlers are using the data.

Someone has to make the sacrifice! ;)

keyplyr




msg:4384009
 7:17 am on Nov 6, 2011 (gmt 0)

But it [big list of IPs] would likely end up just like the big lists of UAs that used to get posted a couple years ago. Trouble is, what's bad for one web site is welcomed at another.

Example: Many of us block all the social parasites (twitter bots, linkedin, facebook, et al) but some webmasters thrive on their data being syndicated with the help of these harvesters.

[edited by: keyplyr at 7:33 am (utc) on Nov 6, 2011]

incrediBILL




msg:4384011
 7:30 am on Nov 6, 2011 (gmt 0)

Example: Many of us block all the social parasites (twitter bots, linkedin, facebook, et al) but some webmasters thrive on their data being syndicated with the help of these harvesters.


I really don't have a problem with how people use the list either way.

My issue is validation, as long as it can be validated, it's better for everyone.

Those user agent lists without IP context are useless, and those were blacklists, again useless IMO unless you use them to build whitelists! hehe.

FWIW, I actually let facebook into my site because their security system does validate links are OK for facebook users to visit, which IMO is a good thing.

dstiles




msg:4384213
 10:15 pm on Nov 6, 2011 (gmt 0)

My linode list is:

66.228.32.0-66.228.63.255
69.164.192.0-69.164.223.255
72.14.176.0-72.14.191.255
74.207.224.0-74.207.255.255
96.126.96.0-96.126.127.255
97.107.128.0-97.107.143.255
109.74.192.0-109.74.207.255 (GB/UK)
173.230.128.0-173.230.159.255
173.255.192.0-173.255.255.255
178.79.128.0-178.79.191.255 (GB/UK)

dstiles




msg:4384217
 10:21 pm on Nov 6, 2011 (gmt 0)

Bill - I have a list of over 3800 "server" ranges of which probably 2500 or so are true server farms. Others are either invasive statics or me mis-interpreting the signs or even, in the case of RU and UA, applying a vindictiveness I probably shouldn't after a particuarly bad batch of /21 /22 /23 and /24 ranges.

I would offer the list but I feel it's too mis-leading and, in part, down to personal prejudice.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved