Welcome to WebmasterWorld Guest from 54.167.185.18

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

WordTracker Attempts Crawling My Site!

Trying to be low key at 180 pages in 14 days

   
8:04 pm on Nov 28, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



For 14 days now WordTracker has been attempting a slow motion crawl against one of my sites and all they've been getting is the same error page as a "200 OK" telling them they've been blocked for behaving badly.

At a minimum, someone is going to get a report that shows a bunch of the same high density keywords ;)

66.132.220.* "POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)"

They have a few IPs at Peer1, the website is on a different IP.

They used to use other Peer1 IPs in the past.

You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

[edited by: incrediBILL at 8:32 pm (utc) on Nov. 28, 2008]

8:06 pm on Nov 28, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Seen 3 times on 2 sites since Nov 22 - thanks!
3:03 am on Nov 29, 2008 (gmt 0)

5+ Year Member



You can easily thwart them blocking Peer1:

OrgName: Peer 1 Dedicated Hosting
NetRange: 69.0.128.0 - 69.0.255.255
CIDR: 69.0.128.0/17

Bill,

Any reason to not block all of Peer 1?

64.29.16.0/20
64.45.0.0/18
64.65.0.0/18
64.77.0.0/17
64.224.0.0/14
64.239.0.0/17

69.0.128.0/17
66.33.0.0/17
66.36.96.0/20
66.111.64.0/19
66.132.128.0/17
66.148.0.0/18
66.223.0.0/17
66.234.0.0/20

207.21.192.0/18
207.159.128.0/19
207.198.64.0/18

209.15.0.0/16
209.25.128.0/17
209.35.0.0/16
209.95.96.0/19
209.196.128.0/18
209.203.224.0/19
209.213.96.0/19

216.25.0.0/17
216.65.0.0/17
216.87.0.0/19
216.87.208.0/20
216.122.0.0/16
216.150.0.0/19
216.152.128.0/20
216.157.0.0/18
216.157.64.0/19
216.157.96.0/20
216.247.0.0/16

3:06 am on Nov 29, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Any reason to not block all of Peer 1?

Considering I host on Peer1/ServerBeach, I have to tread lightly with that.

3:20 am on Nov 29, 2008 (gmt 0)

5+ Year Member



Ooops! Of course didnít mean to put you in awkward situation!

When blocking a server IP range from a server hosting organization I tend to block all similar named ranges from that organization - on the basis they are also probably used for servers.

Cheers,
Phred

4:34 am on Nov 29, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I tend to block all similar named ranges from that organization

Same here in most cases.

No need to leave gaping holes in the fence.

5:07 am on Dec 4, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Follow up...

Got a message from someone at WordTracker saying they don't crawl. They claim it's a lateral search tool that looks for keywords on all of the pages returned from the original search.

Sounds like quibbling over semantics about what constitutes a crawl or not because allowing a SE to crawl a site doesn't mean giving authorization for any other automated task to access pages resulting from that crawl and subsequent search, then crawling those pages yet again without permission.

But that's a different argument for a different day.

Anyway, they claim if you write to them they'll remove your site from their searches.

IMO, honoring robots.txt would certainly be a lot simpler for all involved.

7:03 pm on Dec 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We had a similar situation on one of the sites few month ago and wrote to WordTracker. They replied that their customer was doing a research using their services and they had no control over it. Few of the requests from it was made to an URI that contained no WWW. in it and contained "/..." as well. The only place that URI was reference ever was in MSN SERP: "host.tld/dir/page.h....". Attempts like that dated back to April of 2007. Another IP they have used on several occasions is 64.65.13.36.

REQUEST HEADERS from 66.132.220.238:
Referer: http://www.domain.tld
Connection: close
Host: www.domain.tld
User-Agent: POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)

------------------------
request_method: GET
server_protocol: HTTP/1.0

Notice that the there is no trailing forward slash on the referer.