Forum Moderators: open
activetourist.free-online.co.uk - - [15/Jul/2006:11:29:47 +1000] "GET /robots.txt HTTP/1.1" 200 1930 "-" "Mozilla/4.0 (ActiveTouristBot V1.2 ;http://www.activetourist.com)"
BTW, when contacting suspected owners of bots I've found it helps to write an
emotionally neutral e-mail. Don't antagonize someone that you want help from.
I'm not accusing anyone of anything. It's just a hopefully helpful piece of advice.
########################
I'm the owner of the website and spent good money on getting the spider built, if there is a problem with the spider its normally good manners to inform the webmaster about the problem so that they can get the problem fixed, the site does have a contact form.
Thank you
George
problem with the spider its normally good manners to inform the webmaster about the problem.
George,
Welcome to WebmasterWorld!
As a first time bot owner you may not know what we're dealing with as webmasters opposed to just your site and your lone spider.
I get hit by over 500 spiders a day, and the total spiders I've encountered is in the thousands so it's just not possible or practical to write to each spider owner. Besides, it's not our job to tell you that your spider doesn't work properly as there are many bad spiders out there so we just block the bad ones and forget about them.
Speaking of manners, it's normally considered very good manners to test your spider before unleashing it on the world. I'll give you credit for attempting to honor robots.txt as that's a honorable. However, that spider of yours asked for robots.txt as many or more times than it asked for web pages which is very bad.
What's reported in this thread isn't as bad as what happened on my website.
Sorry to be so blunt, but if you spent good money on that spider I'd ask for a refund as there are completely free open source spiders available that do an excellent job.
[edited by: incrediBILL at 8:23 pm (utc) on Aug. 18, 2006]
"Besides, it's not our job to tell you that your spider doesn't work properly" point taken.
Thanks for the reply, if possible could you pop across to my site or post here the section of the robot.txt file that the bot is not obeying, then I can forward it to the bot designers.
I have no idea why its not obeying the robot.txt files as it was tested before it was launched.
My apologies for any problems the bot caused on your site.
George
The problem wasn't that it didn't obey robots.txt, it asked for it repeatedly and should only ask for it once per crawl.
It was crazy stuff like this...
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /somepage.html
GET /someotherpage.html
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /anotherpage.html
When a robot is supposed to do this:
GET /robots.txt
GET /somepage.html
GET /someotherpage.html
GET /anotherpage.html
... its normally good manners to inform the webmaster about the problem so that they can get the problem fixed, the site does have a contact form.
Perhaps you missed the bit in in my original message which said:
... website is a disaster, all links from the index page return a custom 404 error.
Your contact form was unavailable and there was no way of knowing whether it was deliberate or not.
I simply do not have the time or energy to spare trying to contact all the badly-behaving-bot owners - many of whom do not care a jot what havoc they are wreaking.
Thank you for being one of the few who have joined Webmasterworld to respond to threads about their bots. Regrettably though, after promising to fix their bot's behaviour, too often nothing is ever done. So forgive us for being a tad cynical.
After your bot's initial performance I blocked it. If you would care to return here and inform us when it has been tweaked satisfactorily I will happily unblock it and give it another chance.
Thanks for the reply, as you stated in your 1st post "... website is a disaster, all links from the index page return a custom 404 error."
I have no answer for the above as all pages are .aspx, so any errors should have displayed a custom error message on the page. Even the dreaded 404 has a re-direct to take you to the custom error page informing uses that the page has moved.
As for the bot, it is my intention to ensure that it behaves correctly. the bot started out as V1.0 and is now at V1.3 as any problems encounted are fixed and the upgrade is displayed in the version number.
The reason the website is included in the bot name is to allow any webmasters who encounter problems to contact me, so that it can be looked into, i do not include a email address as this is giving any spammers a free meal if the bot visits their site.
So if any webmasters encounter problems with the bot, please visit the site and use the contact form, in the subject list there is a "spider" option, select that for your mail.
I hope this post answers any question people have about the bot.
Regards
George