Forum Moderators: open

Message Too Old, No Replies

ActiveTouristBot

         

Mokita

10:54 pm on Jul 29, 2006 (gmt 0)

10+ Year Member



Fetched robots.txt four times within 10 seconds plus only four pages out of more than 300. Their website is a disaster, all links from the index page return a custom 404 error.

activetourist.free-online.co.uk - - [15/Jul/2006:11:29:47 +1000] "GET /robots.txt HTTP/1.1" 200 1930 "-" "Mozilla/4.0 (ActiveTouristBot V1.2 ;http://www.activetourist.com)"

incrediBILL

5:10 pm on Aug 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This bot is dumb as a pet rock and after asking for robots.txt a bazillion times, grabs a few pages and then asks for robots.txt a few more times, grabs a couple more pages, more robots.txt and on and on...

Must be a creation of Gump Software.

GaryK

6:08 pm on Aug 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please stop insulting pet rocks. They have far more intelligence than some of these bots. :)

[edited by: GaryK at 6:09 pm (utc) on Aug. 5, 2006]

JustSurfing

7:28 pm on Aug 18, 2006 (gmt 0)

10+ Year Member



Message from GaryK
#######################
Abusive IRLbot

BTW, when contacting suspected owners of bots I've found it helps to write an
emotionally neutral e-mail. Don't antagonize someone that you want help from.
I'm not accusing anyone of anything. It's just a hopefully helpful piece of advice.

########################

I'm the owner of the website and spent good money on getting the spider built, if there is a problem with the spider its normally good manners to inform the webmaster about the problem so that they can get the problem fixed, the site does have a contact form.

Thank you
George

incrediBILL

8:04 pm on Aug 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



problem with the spider its normally good manners to inform the webmaster about the problem.

George,

Welcome to WebmasterWorld!

As a first time bot owner you may not know what we're dealing with as webmasters opposed to just your site and your lone spider.

I get hit by over 500 spiders a day, and the total spiders I've encountered is in the thousands so it's just not possible or practical to write to each spider owner. Besides, it's not our job to tell you that your spider doesn't work properly as there are many bad spiders out there so we just block the bad ones and forget about them.

Speaking of manners, it's normally considered very good manners to test your spider before unleashing it on the world. I'll give you credit for attempting to honor robots.txt as that's a honorable. However, that spider of yours asked for robots.txt as many or more times than it asked for web pages which is very bad.

What's reported in this thread isn't as bad as what happened on my website.

Sorry to be so blunt, but if you spent good money on that spider I'd ask for a refund as there are completely free open source spiders available that do an excellent job.

[edited by: incrediBILL at 8:23 pm (utc) on Aug. 18, 2006]

JustSurfing

8:55 pm on Aug 18, 2006 (gmt 0)

10+ Year Member



Hi incrediBILL

"Besides, it's not our job to tell you that your spider doesn't work properly" point taken.

Thanks for the reply, if possible could you pop across to my site or post here the section of the robot.txt file that the bot is not obeying, then I can forward it to the bot designers.

I have no idea why its not obeying the robot.txt files as it was tested before it was launched.

My apologies for any problems the bot caused on your site.

George

incrediBILL

8:57 pm on Aug 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



George,

The problem wasn't that it didn't obey robots.txt, it asked for it repeatedly and should only ask for it once per crawl.

It was crazy stuff like this...

GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /somepage.html
GET /someotherpage.html
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /robots.txt
GET /anotherpage.html

When a robot is supposed to do this:

GET /robots.txt
GET /somepage.html
GET /someotherpage.html
GET /anotherpage.html

JustSurfing

9:04 pm on Aug 18, 2006 (gmt 0)

10+ Year Member



Thanks incrediBILL

Will pass the info on.

Have a good weekend
George

Mokita

12:16 am on Aug 19, 2006 (gmt 0)

10+ Year Member



JustSurfing wrote:
... its normally good manners to inform the webmaster about the problem so that they can get the problem fixed, the site does have a contact form.

Perhaps you missed the bit in in my original message which said:

... website is a disaster, all links from the index page return a custom 404 error.

Your contact form was unavailable and there was no way of knowing whether it was deliberate or not.

I simply do not have the time or energy to spare trying to contact all the badly-behaving-bot owners - many of whom do not care a jot what havoc they are wreaking.

Thank you for being one of the few who have joined Webmasterworld to respond to threads about their bots. Regrettably though, after promising to fix their bot's behaviour, too often nothing is ever done. So forgive us for being a tad cynical.

After your bot's initial performance I blocked it. If you would care to return here and inform us when it has been tweaked satisfactorily I will happily unblock it and give it another chance.

JustSurfing

11:01 am on Aug 19, 2006 (gmt 0)

10+ Year Member



Hi Mokita

Thanks for the reply, as you stated in your 1st post "... website is a disaster, all links from the index page return a custom 404 error."

I have no answer for the above as all pages are .aspx, so any errors should have displayed a custom error message on the page. Even the dreaded 404 has a re-direct to take you to the custom error page informing uses that the page has moved.

As for the bot, it is my intention to ensure that it behaves correctly. the bot started out as V1.0 and is now at V1.3 as any problems encounted are fixed and the upgrade is displayed in the version number.

The reason the website is included in the bot name is to allow any webmasters who encounter problems to contact me, so that it can be looked into, i do not include a email address as this is giving any spammers a free meal if the bot visits their site.

So if any webmasters encounter problems with the bot, please visit the site and use the contact form, in the subject list there is a "spider" option, select that for your mail.

I hope this post answers any question people have about the bot.

Regards
George

JustSurfing

9:26 am on Aug 25, 2006 (gmt 0)

10+ Year Member



Hi Mokita

Can you send me a URL as the upgrade to the bot has been completed.

Regards
George

Mokita

1:51 am on Aug 27, 2006 (gmt 0)

10+ Year Member



George wrote:
Can you send me a URL as the upgrade to the bot has been completed.

Sorry, I don't give out URLs - especially not to bot-owners.

I have unblocked your bot for now, and am happy to wait and see how well behaved it is next time it finds our site/s by normal channels.

JustSurfing

10:16 am on Aug 28, 2006 (gmt 0)

10+ Year Member



Hi Mokita

Thanks, can you let me know via email when the bot comes round and how it behaves, the version will be V1.4

Regards
George

JustSurfing

11:03 am on Sep 5, 2006 (gmt 0)

10+ Year Member



Hi GaryK

Can you let me know if you received email, as forum mail not displaying as been sent.

Regards
George

GaryK

6:54 pm on Sep 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry I didn't see your comments earlier George.

For the rest of you, George and I have been corresponding by Stickies this afternoon and on its first run V1.4 did a much better job. He just crawled my site again and I'm awaiting the results of that log analysis.

GaryK

12:08 am on Sep 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The crawl went perfectly. We've now got one more well-behaved bot on the internets. Surely that's a good thing? :)

[edited by: GaryK at 12:09 am (utc) on Sep. 7, 2006]