Forum Moderators: open

Message Too Old, No Replies

JemmaTheTourist

ActiveTourist checks robots.txt

         

pendanticist

12:11 am on Mar 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



81.154.39.137 - - [03/Mar/2005:15:06:54 -0800] "GET /robots.txt HTTP/1.1" 200 2004 "-" "Mozilla/4.0 (JemmaTheTourist;http://www.activtourist.com)"
81.154.39.137 - - [03/Mar/2005:15:06:56 -0800] "GET /Blah NOT_related_to_tourism.html HTTP/1.1" 200 11269 "-" "Mozilla/4.0 (JemmaTheTourist;http://www.activtourist.com)"

Jemma is a web crawler that automatically crawls the web looking to add tourist information to our search index. We do this by looking for links within web sites that we can follow and index. Not every page we crawl is indexed, so below we have supplied some recommendations to ensure your pages are indexed.

pendanticist

2:41 am on Mar 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you just earned yourself a big fat 403 for NOT obeying robots.txt.

Bye-bye. Don't go away mad, just go away.

dsinay

8:34 am on Mar 13, 2005 (gmt 0)



Hi
I am the Program Manager of the JemmaTheTourist crawler system and we verified that it is obeying the robots.txt files.
Could you please provide more information about why you think it is not obeying the robots.txt?

If you can send me the URL where there is a robots.txt the JemmaTheTourist crawler is not obeying that would be great.

Thank you in advanced.
Damian