dstiles - 8:25 pm on Apr 30, 2013 (gmt 0)
This is as of 6th April this year. I first saw the crawler about seven days ago (ie around 24th).
IP range: 18.104.22.168 - 22.214.171.124
Bot IPs seen so far are in the range: 126.96.36.199 - 188.8.131.52 but that will no doubt be extended.
Today's UA: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.6.0 Safari/534.34
I do not think that is the genuine bot IP; possibly someone looking to see why the bot is blocked. An earlier UA was:
bl.uk_lddc_bot/3.1.1 (+http :// www.bl.uk / aboutus / legaldeposit / websites / websites / faqswebmaster / index.html)
(link broken up by me)
It's worth reading the legal web page. It claims the RIGHT to harvest ALL UK-based web content. Which has annoyed one of my clients who, although hosting in the UK, was specifically told, about 15 years ago, he should not trade with UK citizens.
There is an option to block through robots.txt but if that's obeyed then surely it negates their mandate? They also say we can block by IP. Hmm. But then, this is UK bureaucracy, which hasn't yet caught up with modern technolgy - ie later than 1950.
Currently blocked but clients canvassed as to what they want done; though I suspect we will have to comply. :(