BrandProtect

File under:
Which part of "Disallow:" did you not understand?

Short version:

158.106.67.181 - - [30/Jul/2014:13:12:57 -0700] "GET /robots.txt HTTP/1.1" 200 885 "http://www.bdbrandprotect.com" "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)" 
158.106.67.181 - - [30/Jul/2014:13:12:57 -0700] "GET /piwik/piwik.php?idsite=3&rec=1 HTTP/1.1" 200 247 "-" "BPImageWalker/2.0 (www.bdbrandprotect.com)"

That's from my personal site, where the piwik files live. The formulation is
<noscript>
<img src = "http://www.example.com/piwik"/piwik.php?idsite=3&rec=1
et cetera on all pages, hence the single request.

robots.txt on this site says in part:

User-Agent: *
...
Disallow: /piwik

Long version:
robots.txt plus 620 image requests-- including the entire contents of three roboted-out subdirectories-- from my main site.

The MSIE UA was used only for requesting robots.txt. (I serve the same file to everyone.) All other image requests-- i.e. 620 + 1-- are

BPImageWalker/2.0 (www.bdbrandprotect.com)

IP range for the full visit was
158.106.67.128-200 (really).
BrandProtect as a whole is 158.106.64.0/18; the robots stuck to the narrower range.

There is a sister robot called LinkWalker

LinkWalker/3.0 (http://www.brandprotect.com)

that crawls pages. It did its stuff about 2 1/2 hours earlier. Mysteriously this one does seem to honor robots.txt, barring the common initial pattern of

robots.txt 301
/ 301
robots.txt 200
/ 200

meaning that it requested the front page before it had actually seen robots.txt. Apart from that, though, it behaved itself. It did not ask for any css or js.

Since it began its crawl on the front page and I've never met the range before, I don't know what prompted its interest. If I'm only going to see it once in three years, it may not be worth blocking ;)

BrandProtect

lucy24

Pfui

not2easy

keyplyr

Pfui

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week