Forum Moderators: open

Message Too Old, No Replies

FreshNotes crawler

         

keyplyr

11:20 pm on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



72.3.225.37 - - [06/Mar/2006:14:20:10 -0800] "GET / HTTP/1.0" 200 10773 "http://www.referrer.com" "FreshNotes crawler, report problems to crawler-at-freshnotes-dot-com"

Did not request robots.txt. Blocked until they do, sent feedback.

BrianMLima

11:16 pm on Mar 28, 2006 (gmt 0)



Hello,

FreshNotes R&D checked this out very thoroughly as it has been the only complaint we have received while crawling. At FreshNotes we are extremely concerned with any possibility of our crawler behaving in an unethical manner. We received a message from this Webmaster and immediately tested our system to ensure it strictly adhered to the robots.txt exclusion standard. We found that the system does adhere to the robots.txt agreement and have not received any other notices that lead us to believe otherwise.

We take any unethical or non-nice behavior of our systems very seriously. If anyone has any questions or concerns about our crawlers behavior please send any pertinent information to crawler-at-freshnotes-dot-com. Please include any logs to help us track down any problems.

Best regards,

Brian M. Lima
Director of Research & Development
FreshNotes LLC: Why Search? Discover.

keyplyr

9:32 am on Mar 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello Brian. I did receive a response from you as memory serves me, thank you. I've been in the middle of moving sites to new servers and may not have replied.

As I found you are building a directory that's topic specific to my site, I did unblock your bot. Still never saw it request robots.txt, however this may have gone unnoticed if the event wasn't within a reasonable time frame relating to the crawl.

If you do a bit of reading in these forums you'll find a strong opinion among webmasters in favor of bot owners including a link to a webpage that explains the reason for information gathering, rather than just an email address.

Dijkgraaf

11:07 pm on Mar 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Possibly it is checking robots.txt with a different User Agent and possibly even from a different IP address? I've come accross that a few times.

One way to make sure that bots are obeying robots.txt is the good old robot trap. See the Blocking Badly Behaved Bots [webmasterworld.com] thread.