lucy24 - 1:34 am on Sep 26, 2011 (gmt 0)
67.228.nnn.nn - - [24/Sep/2011:17:52:25 -0700] "GET /rats/images/Yummy.jpg HTTP/1.1" 200 22814 "-" "Mozilla/5.0+(compatible;+PiplBot;++http://www.pipl.com/bot/)"
They've got a good line:
PiplBot is Pipl's web-indexing robot. PiplBot crawler collects documents from the Web to build a searchable index for our People Search engine.
Unlike a typical search-engine robots, PiplBot is designed to retrieve information from the deep web [pipl.com]; our robots are set to interact with searchable databases and not only follow links from other websites.
As part of the crawling, PiplBot takes robots.txt standards into account to ensure we do not crawl and index content from those pages whose content you do not want included in Pipl Search.
I found this paragraph a little obscure, since their bot did not even go through the motions of consulting robots.txt before heading straight for a roboted-out directory.
the term "deep web" refers to a vast repository of underlying content, such as documents in online databases that general-purpose web crawlers cannot reach. The deep web content is estimated at 500 times that of the surface web, yet has remained mostly untapped due to the limitations of traditional search engines.
That would be, like, the tedious formality of reading and obeying robot-exclusion rules?
It's an awful shame you're not allowed to post personal links. Something tells me I'm going to lie awake nights wondering who out there in the Internet stopped short at the picture of Miranda, Malcolm and Nelly and cried "That looks just like cousin Maisie!" before running off to People Search with this promising lead.
Be kind to your four-footed friends
Any rat may be somebody's long-lost relative
Hm. Doesn't quite scan, does it?