Forum Moderators: open

Message Too Old, No Replies

Inktomi Ignoring Robots.txt file

Is anyone else getting wrong pages spidered?

         

mpthink

12:46 pm on Oct 8, 2002 (gmt 0)

10+ Year Member



We are getting some annoying problems with Inktomi. They are ignoring our robots.txt file and spidering pages which they should not be. The result is that we are having pages penalised when they should not even be there.

We have checked our robots.txt file's specifications and compared it with those recommended on the Inktomi site and everything seems in order. However, at present every one of our domains has pages listed that shouldn't be there on all Inktomi-powered search engines.

If the mistake is with Inktomi, as seems logical, then this problem must affecting other people as well. Is anyone else having issues with this, or can offer any explanations/solutions?

onlineleben

1:22 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Had something similar last weekend. More than 200 pages requested that I never had.

mayor

3:29 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



shouldn't this do the job?:

User-agent: Slurp
Disallow: /

Doesn't look like rocket science to me. But then maybe the 'bot has indigestion from visiting an old sushi site and that's making him a little myopic.

Being more serious, do you suppose the bot may be holding on to a cached older copy of robots.txt that did allow him to run all over the site? Don't know if that's possible but did you recently change the file?

My problem is different. Slurp comes in and reaches for robot.txt, finds none, then comes right back from a different ip address, hits the index page and goes away. I want Slurp to suck up the whole site but it won't.