Hi Guys, I paid for two urls to be indexed via **** for Inktomi's spider. Everything was fine until I added a robots.txt I found on a thread here (very comprehensive) and now it is coming up with an error when I check my account: Inktomi encountered the following error when attempting to crawl the following URL:
Welcome to WebmasterWorld [webmasterworld.com]! (Please read this and the Terms of Service below - we don't post URL's except in profiles)
However... Your robots.txt has got several problems - problems which are likely "fatal errors" and may lead most 'bots to give up and take the default User-agent * record - which tells them to stay out.
I recommend you rename that robots.txt file to robots.tst immediately to put it safely out of the robots' sight. Then use the robots.txt Validator [searchengineworld.com] at WebmasterWorld's sister site, Search Engine World. Note that you can tell it to validate robots.tst, so that you can work on the file without having a real 'bot choke on it. Get it fixed, then name it back to robots.txt.
The major problem with your robots.txt is that you have "extra" newlines in it. User-agent and Disallow directives may not span multiple lines, and a blank line is required between records (before each new User-agent, that is).
Look at the last entry in your robots.txt file. Here it is...
User-agent: * Disallow: /
I'd also take jdMorgan's advice and validate your file quickly. That last line in your robots.txt file is disallowing all spiders from indexing your site. You might as well get rid of the rest of them if you keep that last one. You could save on file size! ;)
P.S. You want to be very careful when handling the robots.txt file. If you are copying someone else's file, they may have entries in there that won't apply to your site. Where did you pick up that last one...
Thanks guys...yikes! :)) I got the robots.txt from Search Engine World (example 4) and thought that it would do as it said. I have now fixed the wrapped lines and tidied the file up and it validates just fine. With regards to the last line that doesn't allow anything, won't that only be read once the others above it have been read so it will allow any of the listed bots but nothign else?
If it isn't right perhaps I need to alert the people at Search Engine World too.
Here's the way I understand the robot directives to work:
This is a general wildcard directive for all bots (an exclusion directive in this example):
User agent: * Disallow: /
But a specific directive can override the general one, just for the specified bot, while all non-specified bots remain subject to the general directive. So this record would override the above general exclusion and specifically allow (ie not exclude) scooter while all other bots remain excluded:
User-agent: scooter Disallow:
Both records would have to appear in the file for this explanation to hold.
Hello bebox, mayor is correct. I did not look at the file correctly last night and should be flogged for replying with my comments! ;)
First thing I'd do, is take care of the errors, which you've done. Check your Ink listings and see if the error is still there.
If so, then I would drop your Ink provider an email and alert them to the issues. That last Disallow: line is fine, I did not see that you had an allow for Slurp/2.0. I need to make sure I am fully alert when hanging out around here!