Forum Moderators: open

Message Too Old, No Replies

BacklinkCrawler

         

lucy24

8:16 pm on Apr 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IP: 5.9.65.19 (given in full because they have used this exact IP for a very long time)
UA: BacklinkCrawler (http://www.backlinktest.com/crawler.html)
robots.txt: yes, and may be compliant
headers: fully humanoid

I met this today (which is to say in yesterday's logs) and it drove me bonkers because I could swear I'd seen the name before, but I couldn't find it anywhere--not in my robots.txt, or the hole-poking section of access controls, or my Header Access checklist.

Turns out its most recent visit was in March 2015, and that was on my old site (now reduced to my personal site), shortly before I changed over to header-based access controls. In the past--going back to 2011 which are my oldest saved logs--it has used 46.4.two-different-exact-IPs and 144.76.one-exact-IP. I kinda think they're all Hetzner, so almost all those earlier visits (robots.txt, sitemap, front page) were blocked. Now, thanks to humanoid headers, they slipped right in.

It doesn't seem to like URLs ending in pagename.html, because all it requested were directories at various depths. In particular, it did not ask for anything in the /boilerplate/ directory, which is roboted-out but its constituent pages are linked from everywhere. That's why I say with hesitation “may be compliant”.

Oh, and the URL in the UA leads to a “Seite Nicht Gefunden” page in two languages. I first wondered if someone else had stolen the UA, but the IP matched earlier visits. I'll see if anyone reads the site's Kontakt form.

keyplyr

9:28 pm on Apr 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I seldom see it. Last time was over 6 months ago. IMO just another data scraper offering no info to get it access.