homepage Welcome to WebmasterWorld Guest from 54.161.175.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
WordTracker Attempts Crawling My Site!
Trying to be low key at 180 pages in 14 days
incrediBILL




msg:3796217
 8:04 pm on Nov 28, 2008 (gmt 0)

For 14 days now WordTracker has been attempting a slow motion crawl against one of my sites and all they've been getting is the same error page as a "200 OK" telling them they've been blocked for behaving badly.

At a minimum, someone is going to get a report that shows a bunch of the same high density keywords ;)

66.132.220.* "POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)"

They have a few IPs at Peer1, the website is on a different IP.

They used to use other Peer1 IPs in the past.

You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

[edited by: incrediBILL at 8:32 pm (utc) on Nov. 28, 2008]

 

caribguy




msg:3796223
 8:06 pm on Nov 28, 2008 (gmt 0)

Seen 3 times on 2 sites since Nov 22 - thanks!

phred




msg:3796395
 3:03 am on Nov 29, 2008 (gmt 0)

You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

Bill,

Any reason to not block all of Peer 1?

64.29.16.0/20
64.45.0.0/18
64.65.0.0/18
64.77.0.0/17
64.224.0.0/14
64.239.0.0/17

69.0.128.0/17
66.33.0.0/17
66.36.96.0/20
66.111.64.0/19
66.132.128.0/17
66.148.0.0/18
66.223.0.0/17
66.234.0.0/20

207.21.192.0/18
207.159.128.0/19
207.198.64.0/18

209.15.0.0/16
209.25.128.0/17
209.35.0.0/16
209.95.96.0/19
209.196.128.0/18
209.203.224.0/19
209.213.96.0/19

216.25.0.0/17
216.65.0.0/17
216.87.0.0/19
216.87.208.0/20
216.122.0.0/16
216.150.0.0/19
216.152.128.0/20
216.157.0.0/18
216.157.64.0/19
216.157.96.0/20
216.247.0.0/16

incrediBILL




msg:3796397
 3:06 am on Nov 29, 2008 (gmt 0)

Any reason to not block all of Peer 1?

Considering I host on Peer1/ServerBeach, I have to tread lightly with that.

phred




msg:3796407
 3:20 am on Nov 29, 2008 (gmt 0)

Ooops! Of course didnít mean to put you in awkward situation!

When blocking a server IP range from a server hosting organization I tend to block all similar named ranges from that organization - on the basis they are also probably used for servers.

Cheers,
Phred

incrediBILL




msg:3796426
 4:34 am on Nov 29, 2008 (gmt 0)

I tend to block all similar named ranges from that organization

Same here in most cases.

No need to leave gaping holes in the fence.

incrediBILL




msg:3799737
 5:07 am on Dec 4, 2008 (gmt 0)

Follow up...

Got a message from someone at WordTracker saying they don't crawl. They claim it's a lateral search tool that looks for keywords on all of the pages returned from the original search.

Sounds like quibbling over semantics about what constitutes a crawl or not because allowing a SE to crawl a site doesn't mean giving authorization for any other automated task to access pages resulting from that crawl and subsequent search, then crawling those pages yet again without permission.

But that's a different argument for a different day.

Anyway, they claim if you write to them they'll remove your site from their searches.

IMO, honoring robots.txt would certainly be a lot simpler for all involved.

blend27




msg:3800157
 7:03 pm on Dec 4, 2008 (gmt 0)

We had a similar situation on one of the sites few month ago and wrote to WordTracker. They replied that their customer was doing a research using their services and they had no control over it. Few of the requests from it was made to an URI that contained no WWW. in it and contained "/..." as well. The only place that URI was reference ever was in MSN SERP: "host.tld/dir/page.h....". Attempts like that dated back to April of 2007. Another IP they have used on several occasions is 64.65.13.36.

REQUEST HEADERS from 66.132.220.238:
Referer: http://www.domain.tld
Connection: close
Host: www.domain.tld
User-Agent: POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)

------------------------
request_method: GET
server_protocol: HTTP/1.0

Notice that the there is no trailing forward slash on the referer.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved