Welcome to WebmasterWorld Guest from 54.224.253.82

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

British Library bot

UK law now requires British Library to harvest UK web sites.

     
8:25 pm on Apr 30, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3101
votes: 3


This is as of 6th April this year. I first saw the crawler about seven days ago (ie around 24th).

IP range: 194.66.224.0 - 194.66.239.255
Bot IPs seen so far are in the range: 194.66.232.84 - 194.66.232.93 but that will no doubt be extended.

Today's UA: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.6.0 Safari/534.34

I do not think that is the genuine bot IP; possibly someone looking to see why the bot is blocked. An earlier UA was:

bl.uk_lddc_bot/3.1.1 (+http :// www.bl.uk / aboutus / legaldeposit / websites / websites / faqswebmaster / index.html)

(link broken up by me)

It's worth reading the legal web page. It claims the RIGHT to harvest ALL UK-based web content. Which has annoyed one of my clients who, although hosting in the UK, was specifically told, about 15 years ago, he should not trade with UK citizens.

There is an option to block through robots.txt but if that's obeyed then surely it negates their mandate? They also say we can block by IP. Hmm. But then, this is UK bureaucracy, which hasn't yet caught up with modern technolgy - ie later than 1950.

Currently blocked but clients canvassed as to what they want done; though I suspect we will have to comply. :(
7:21 am on May 1, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2446
votes: 32


This is going to be fun. Does the BL have the resources to spider large (>100M pages) websites?

Regards...jmcc
1:56 pm on May 1, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3101
votes: 3


It has been pointed out to me...

I do not think that is the genuine bot IP

should read

I do not think that is the genuine bot UA

Thanks, Lucy. :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members