Welcome to WebmasterWorld Guest from 107.20.75.63

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

WordTracker Attempts Crawling My Site!

Trying to be low key at 180 pages in 14 days

     
8:04 pm on Nov 28, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


For 14 days now WordTracker has been attempting a slow motion crawl against one of my sites and all they've been getting is the same error page as a "200 OK" telling them they've been blocked for behaving badly.

At a minimum, someone is going to get a report that shows a bunch of the same high density keywords ;)

66.132.220.* "POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)"

They have a few IPs at Peer1, the website is on a different IP.

They used to use other Peer1 IPs in the past.

You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

[edited by: incrediBILL at 8:32 pm (utc) on Nov. 28, 2008]

8:06 pm on Nov 28, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 16, 2007
posts:846
votes: 0


Seen 3 times on 2 sites since Nov 22 - thanks!
3:03 am on Nov 29, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:May 11, 2008
posts:55
votes: 0


You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

Bill,

Any reason to not block all of Peer 1?

64.29.16.0/20
64.45.0.0/18
64.65.0.0/18
64.77.0.0/17
64.224.0.0/14
64.239.0.0/17

69.0.128.0/17
66.33.0.0/17
66.36.96.0/20
66.111.64.0/19
66.132.128.0/17
66.148.0.0/18
66.223.0.0/17
66.234.0.0/20

207.21.192.0/18
207.159.128.0/19
207.198.64.0/18

209.15.0.0/16
209.25.128.0/17
209.35.0.0/16
209.95.96.0/19
209.196.128.0/18
209.203.224.0/19
209.213.96.0/19

216.25.0.0/17
216.65.0.0/17
216.87.0.0/19
216.87.208.0/20
216.122.0.0/16
216.150.0.0/19
216.152.128.0/20
216.157.0.0/18
216.157.64.0/19
216.157.96.0/20
216.247.0.0/16

3:06 am on Nov 29, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Any reason to not block all of Peer 1?

Considering I host on Peer1/ServerBeach, I have to tread lightly with that.

3:20 am on Nov 29, 2008 (gmt 0)

Junior Member

5+ Year Member

joined:May 11, 2008
posts:55
votes: 0


Ooops! Of course didnít mean to put you in awkward situation!

When blocking a server IP range from a server hosting organization I tend to block all similar named ranges from that organization - on the basis they are also probably used for servers.

Cheers,
Phred

4:34 am on Nov 29, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


I tend to block all similar named ranges from that organization

Same here in most cases.

No need to leave gaping holes in the fence.

5:07 am on Dec 4, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14624
votes: 88


Follow up...

Got a message from someone at WordTracker saying they don't crawl. They claim it's a lateral search tool that looks for keywords on all of the pages returned from the original search.

Sounds like quibbling over semantics about what constitutes a crawl or not because allowing a SE to crawl a site doesn't mean giving authorization for any other automated task to access pages resulting from that crawl and subsequent search, then crawling those pages yet again without permission.

But that's a different argument for a different day.

Anyway, they claim if you write to them they'll remove your site from their searches.

IMO, honoring robots.txt would certainly be a lot simpler for all involved.

7:03 pm on Dec 4, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1666
votes: 35


We had a similar situation on one of the sites few month ago and wrote to WordTracker. They replied that their customer was doing a research using their services and they had no control over it. Few of the requests from it was made to an URI that contained no WWW. in it and contained "/..." as well. The only place that URI was reference ever was in MSN SERP: "host.tld/dir/page.h....". Attempts like that dated back to April of 2007. Another IP they have used on several occasions is 64.65.13.36.

REQUEST HEADERS from 66.132.220.238:
Referer: http://www.domain.tld
Connection: close
Host: www.domain.tld
User-Agent: POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)

------------------------
request_method: GET
server_protocol: HTTP/1.0

Notice that the there is no trailing forward slash on the referer.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members