jdMorgan

msg:1527839 | 6:07 am on Dec 28, 2002 (gmt 0) |
bebox, Welcome to WebmasterWorld [webmasterworld.com]! (Please read this and the Terms of Service below - we don't post URL's except in profiles) However... Your robots.txt has got several problems - problems which are likely "fatal errors" and may lead most 'bots to give up and take the default User-agent * record - which tells them to stay out. I recommend you rename that robots.txt file to robots.tst immediately to put it safely out of the robots' sight. Then use the robots.txt Validator [searchengineworld.com] at WebmasterWorld's sister site, Search Engine World. Note that you can tell it to validate robots.tst, so that you can work on the file without having a real 'bot choke on it. Get it fixed, then name it back to robots.txt. The major problem with your robots.txt is that you have "extra" newlines in it. User-agent and Disallow directives may not span multiple lines, and a blank line is required between records (before each new User-agent, that is). HTH, Jim
|
pageoneresults

msg:1527840 | 7:05 am on Dec 28, 2002 (gmt 0) |
Look at the last entry in your robots.txt file. Here it is... User-agent: * Disallow: / I'd also take jdMorgan's advice and validate your file quickly. That last line in your robots.txt file is disallowing all spiders from indexing your site. You might as well get rid of the rest of them if you keep that last one. You could save on file size! ;) P.S. You want to be very careful when handling the robots.txt file. If you are copying someone else's file, they may have entries in there that won't apply to your site. Where did you pick up that last one... User agent: * Disallow: /
|
bebox

msg:1527841 | 8:44 am on Dec 28, 2002 (gmt 0) |
Thanks guys...yikes! :)) I got the robots.txt from Search Engine World (example 4) and thought that it would do as it said. I have now fixed the wrapped lines and tidied the file up and it validates just fine. With regards to the last line that doesn't allow anything, won't that only be read once the others above it have been read so it will allow any of the listed bots but nothign else? If it isn't right perhaps I need to alert the people at Search Engine World too. Thanks for your help and will try to fix now. B
|
mayor

msg:1527842 | 4:41 pm on Dec 28, 2002 (gmt 0) |
Here's the way I understand the robot directives to work: This is a general wildcard directive for all bots (an exclusion directive in this example): User agent: * Disallow: / But a specific directive can override the general one, just for the specified bot, while all non-specified bots remain subject to the general directive. So this record would override the above general exclusion and specifically allow (ie not exclude) scooter while all other bots remain excluded: User-agent: scooter Disallow: Both records would have to appear in the file for this explanation to hold.
|
pageoneresults

msg:1527843 | 6:11 pm on Dec 28, 2002 (gmt 0) |
Hello bebox, mayor is correct. I did not look at the file correctly last night and should be flogged for replying with my comments! ;) First thing I'd do, is take care of the errors, which you've done. Check your Ink listings and see if the error is still there. If so, then I would drop your Ink provider an email and alert them to the issues. That last Disallow: line is fine, I did not see that you had an allow for Slurp/2.0. I need to make sure I am fully alert when hanging out around here!
|
bebox

msg:1527844 | 1:41 pm on Dec 29, 2002 (gmt 0) |
No probs mate....appreciate the feedback. I have cleaned up the file and it validates 100% now that i have gotten rid of the wrapped lines etc :)) Inktomi (Slurp) now seems to have no problem with the file and my URL's are being indexed again....yeh! Thanks for all the help B
|
|