| 4:48 pm on Jul 30, 2012 (gmt 0)|
This is a cached page of http://www.example.com/index.html from Blekko's web crawl.
Error: No content
end of quote
Course I deny just about everything.
| 6:27 pm on Jul 30, 2012 (gmt 0)|
The question was... what range?
| 6:41 pm on Jul 30, 2012 (gmt 0)|
220.127.116.11 - - [01/May/2012:08:26:26 +0100] "GET /robots.txt HTTP/1.1" 200 2642 "-" "Mozilla/5.0 (compatible; Blekkobot; ScoutJet; +http://blekko.com/about/blekkobot)"
18.104.22.168 - - [01/May/2012:08:30:44 +0100] "GET /robots.txt HTTP/1.1" 200 2642 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)"
| 8:12 pm on Jul 30, 2012 (gmt 0)|
I have ranges:
22.214.171.124 - 126.96.36.199
188.8.131.52 - 184.108.40.206
220.127.116.11 - 18.104.22.168
If you go to the link in the bot's UA blekko give you their crawling ranges. One of the better engines in that respect.
199.187.122.nn does not appear to be assoviated with blekko? Could be a coincidence or someone scraping blekko for links?
| 8:22 pm on Jul 30, 2012 (gmt 0)|
22.214.171.124 - - [01/May/2012:08:26:26 +0100]
126.96.36.199 - - [01/May/2012:08:30:44 +0100]
Note the times?
This is approximately 3AM EST, and a very slow time for my websites, thus the log entries were consecutive.
Most North American widget folks are sleeping at that time.
The mid and western Euro's that I allow access to are just beginning their days (these widget folks tend be more active in the late-afternoon and evenings) with an approximate 5-6 hour difference.
Too much of a coincidence for me to dis-associate the two, however everybody knows I'm paranoid ;)
| 8:29 pm on Jul 30, 2012 (gmt 0)|
Thanks dstiles. Didn't have the 188.8.131.52 - 184.108.40.206
I had the Silicon Valley Colo as a larger block:
220.127.116.11 - 18.104.22.168
And wilderness I'll keep an eye on 199.187.122.* Thanks
| 8:09 pm on Jul 31, 2012 (gmt 0)|
wilderness - my point was that the 199.187.122.nn hit was probably not blekko itself but either someone using blekko as a search source (as is common with google and other SEs) or someone running an automated scrape, which could as easily come at that time as at any other.
Given your sassertion re: access activity times, I would opt for the former: using blekko as a scraper search source.
Either way, I don't see blekko itself being the culprit, although I could be wrong. I'd be interested to see any other evidence of association between blekko and databasebydesign.
| 8:16 pm on Jul 31, 2012 (gmt 0)|
|I'd be interested to see any other evidence of association between blekko and databasebydesign. |
Unfortunately and in most instances, and after denying a range I no longer continue accumulating references.
| 4:21 am on Aug 4, 2012 (gmt 0)|
Since denying Blekkobot/ScoutJet it now shows up every single day requesting robots.txt 20 to 30 times.
The good news is, while my pages are still listed there, all the cached copies are now gone.