Msg#: 3796215 posted 8:04 pm on Nov 28, 2008 (gmt 0)
For 14 days now WordTracker has been attempting a slow motion crawl against one of my sites and all they've been getting is the same error page as a "200 OK" telling them they've been blocked for behaving badly.
At a minimum, someone is going to get a report that shows a bunch of the same high density keywords ;)
Msg#: 3796215 posted 5:07 am on Dec 4, 2008 (gmt 0)
Got a message from someone at WordTracker saying they don't crawl. They claim it's a lateral search tool that looks for keywords on all of the pages returned from the original search.
Sounds like quibbling over semantics about what constitutes a crawl or not because allowing a SE to crawl a site doesn't mean giving authorization for any other automated task to access pages resulting from that crawl and subsequent search, then crawling those pages yet again without permission.
But that's a different argument for a different day.
Anyway, they claim if you write to them they'll remove your site from their searches.
IMO, honoring robots.txt would certainly be a lot simpler for all involved.
Msg#: 3796215 posted 7:03 pm on Dec 4, 2008 (gmt 0)
We had a similar situation on one of the sites few month ago and wrote to WordTracker. They replied that their customer was doing a research using their services and they had no control over it. Few of the requests from it was made to an URI that contained no WWW. in it and contained "/..." as well. The only place that URI was reference ever was in MSN SERP: "host.tld/dir/page.h....". Attempts like that dated back to April of 2007. Another IP they have used on several occasions is 18.104.22.168.
REQUEST HEADERS from 22.214.171.124: Referer: http://www.domain.tld Connection: close Host: www.domain.tld User-Agent: POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)
------------------------ request_method: GET server_protocol: HTTP/1.0
Notice that the there is no trailing forward slash on the referer.