Cloaks crawling; asks for, then ignores robots.txt; badly coded (note space pre last slash) UA; hits dynamic files:
THIS WEEK:
node-176-9-31-202.cluster.eu.webcrawler.pixray.com
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 /Nutch-1.2
robots.txt? Yes BUT immediately ignored.
LAST WEEK:
static.202.31.9.176.clients.your-server.de
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 /Nutch-1.2
robots.txt? Yes BUT immediately ignored.
IP for BOTH of the above = 176.9.31.202 [
projecthoneypot.org...]
(Surprise, surprise: 176.9.0.0/16 = HETZNER)
Lots of auto-block triggers in the preceding info for most of us but some may be deceived by the robots.txt request. Also note that from the PHP data, Nutch isn't the only UA:
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101