Welcome to WebmasterWorld Guest from 54.226.89.2

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

British Library bot

UK law now requires British Library to harvest UK web sites.

     

dstiles

8:25 pm on Apr 30, 2013 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



This is as of 6th April this year. I first saw the crawler about seven days ago (ie around 24th).

IP range: 194.66.224.0 - 194.66.239.255
Bot IPs seen so far are in the range: 194.66.232.84 - 194.66.232.93 but that will no doubt be extended.

Today's UA: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.6.0 Safari/534.34

I do not think that is the genuine bot IP; possibly someone looking to see why the bot is blocked. An earlier UA was:

bl.uk_lddc_bot/3.1.1 (+http :// www.bl.uk / aboutus / legaldeposit / websites / websites / faqswebmaster / index.html)

(link broken up by me)

It's worth reading the legal web page. It claims the RIGHT to harvest ALL UK-based web content. Which has annoyed one of my clients who, although hosting in the UK, was specifically told, about 15 years ago, he should not trade with UK citizens.

There is an option to block through robots.txt but if that's obeyed then surely it negates their mandate? They also say we can block by IP. Hmm. But then, this is UK bureaucracy, which hasn't yet caught up with modern technolgy - ie later than 1950.

Currently blocked but clients canvassed as to what they want done; though I suspect we will have to comply. :(

jmccormac

7:21 am on May 1, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is going to be fun. Does the BL have the resources to spider large (>100M pages) websites?

Regards...jmcc

dstiles

1:56 pm on May 1, 2013 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



It has been pointed out to me...

I do not think that is the genuine bot IP

should read

I do not think that is the genuine bot UA

Thanks, Lucy. :)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month