homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Getting ERROR_DISALLOW with Inktomi after adding new robots.txt
bebox




msg:1527838
 5:49 am on Dec 28, 2002 (gmt 0)

Hi Guys,
I paid for two urls to be indexed via **** for Inktomi's spider. Everything was fine until I added a robots.txt I found on a thread here (very comprehensive) and now it is coming up with an error when I check my account:
Inktomi encountered the following error when attempting to crawl the following URL:

[example.com...]

ERROR_DISALLOW: Crawling Disallowed By robots.txt

The thing is I have double checked the Bot name - Slurp/2.0 and the robots.txt has that exact bot and it is allowed anywhere on the site so there should be no probs.

Can someone shed some light on this for me?

Many thanks
Bebox

[edited by: Marcia at 7:22 am (utc) on Dec. 28, 2002]
[edit reason] no specifics, please [/edit]

 

jdMorgan




msg:1527839
 6:07 am on Dec 28, 2002 (gmt 0)

bebox,

Welcome to WebmasterWorld [webmasterworld.com]!
(Please read this and the Terms of Service below - we don't post URL's except in profiles)

However... Your robots.txt has got several problems - problems which are likely "fatal errors" and may lead most 'bots to give up and take the default User-agent * record - which tells them to stay out.

I recommend you rename that robots.txt file to robots.tst immediately to put it safely out of the robots' sight. Then use the robots.txt Validator [searchengineworld.com] at WebmasterWorld's sister site, Search Engine World. Note that you can tell it to validate robots.tst, so that you can work on the file without having a real 'bot choke on it. Get it fixed, then name it back to robots.txt.

The major problem with your robots.txt is that you have "extra" newlines in it. User-agent and Disallow directives may not span multiple lines, and a blank line is required between records (before each new User-agent, that is).

HTH,
Jim

pageoneresults




msg:1527840
 7:05 am on Dec 28, 2002 (gmt 0)

Look at the last entry in your robots.txt file. Here it is...

User-agent: *
Disallow: /

I'd also take jdMorgan's advice and validate your file quickly. That last line in your robots.txt file is disallowing all spiders from indexing your site. You might as well get rid of the rest of them if you keep that last one. You could save on file size! ;)

P.S. You want to be very careful when handling the robots.txt file. If you are copying someone else's file, they may have entries in there that won't apply to your site. Where did you pick up that last one...

User agent: *
Disallow: /

bebox




msg:1527841
 8:44 am on Dec 28, 2002 (gmt 0)

Thanks guys...yikes! :))
I got the robots.txt from Search Engine World (example 4) and thought that it would do as it said. I have now fixed the wrapped lines and tidied the file up and it validates just fine. With regards to the last line that doesn't allow anything, won't that only be read once the others above it have been read so it will allow any of the listed bots but nothign else?

If it isn't right perhaps I need to alert the people at Search Engine World too.

Thanks for your help and will try to fix now.

B

mayor




msg:1527842
 4:41 pm on Dec 28, 2002 (gmt 0)

Here's the way I understand the robot directives to work:

This is a general wildcard directive for all bots (an exclusion directive in this example):

User agent: *
Disallow: /

But a specific directive can override the general one, just for the specified bot, while all non-specified bots remain subject to the general directive. So this record would override the above general exclusion and specifically allow (ie not exclude) scooter while all other bots remain excluded:

User-agent: scooter
Disallow:

Both records would have to appear in the file for this explanation to hold.

pageoneresults




msg:1527843
 6:11 pm on Dec 28, 2002 (gmt 0)

Hello bebox, mayor is correct. I did not look at the file correctly last night and should be flogged for replying with my comments! ;)

First thing I'd do, is take care of the errors, which you've done. Check your Ink listings and see if the error is still there.

If so, then I would drop your Ink provider an email and alert them to the issues. That last Disallow: line is fine, I did not see that you had an allow for Slurp/2.0. I need to make sure I am fully alert when hanging out around here!

bebox




msg:1527844
 1:41 pm on Dec 29, 2002 (gmt 0)

No probs mate....appreciate the feedback. I have cleaned up the file and it validates 100% now that i have gotten rid of the wrapped lines etc :))

Inktomi (Slurp) now seems to have no problem with the file and my URL's are being indexed again....yeh!

Thanks for all the help
B

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved