Forum Moderators: open

Message Too Old, No Replies

Servers returning 403 rather than 404 for robots.txt file......

         

kahuna

9:52 pm on Mar 26, 2004 (gmt 0)

10+ Year Member



Servers returning 403 rather than 404 for robots.txt file......
Sorry to bring this up here but G is the king so I have some websites I am working with and the above is being returned as mentioned........

So for the king of bots... what are we talking about here and the significance of a 403 error and a 404 error in the robots.txt file search of G.
That is does it make a difference.

Kahuna.

Marcia

9:57 pm on Mar 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is happening only with Googlebot? What about other search engines? 403 or 404 for those?

Powdork

1:37 am on Mar 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If what you are saying is that your servers are returning a 403 Forbidden Header when robots.txt is requested then you do have a problem with G. When G receives a 404 she will spider. When she receives a 200 she will follow the arguments in the file according to protocal. When she receives anything else, she goes away and should not spider the entire domain.
That is only my understanding. Someone plz correct me otherwise.

kahuna

12:24 pm on Mar 27, 2004 (gmt 0)

10+ Year Member



Google has yet to spider the page (luckily), MSN and Ink have come by and it's all 403 errors.
It is a Linux system that is obviously set up incorrectly...
I don't use a robots.txt file, so the error should be a 404 error... but as mentioned the 403 are being returned because of the glitch.

So I am "hoping ++++" that the bots just ignor the difference... Boy.. this sure would screw somebody up that didn't know about these things... what rookie uploads a robots.txt file? I just uploaded a blank robots.txt file until the company fixes the problem.

Thanks again group.

ciml

12:57 pm on Mar 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There was (maybe is) an Apache derived server returning 403 instead of 404.

While Powdork's description makes sense, this was a big enough problem for Google to switch their behaviour and crawl domains where /robots.txt returned 403.

That was some time ago, but if it's still the case then 403 for /robots.txt should be no problem.

kahuna

1:36 pm on Mar 27, 2004 (gmt 0)

10+ Year Member



The operating system is: Linux 2.2.19-6.2.11
The web server is: Apache/1.3.9 (Unix) (Red Hat/Linux) PHP/4.3.1

So I am hopefully guessing that MSN and Ink that already visited will behave the same way.

And once again thank you very much group.

K.

encyclo

1:41 pm on Mar 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't use a robots.txt file, so the error should be a 404 error.

You should deal with the most urgent problem first.

Before doing anything else, upload a blank (empty) file called robots.txt to your document root - that way, the bots won't get any error and your site will be indexed. Only then should you worry about incorrect error codes, probably by emailing your hosting company to complain about their setup.

kahuna

3:01 pm on Mar 27, 2004 (gmt 0)

10+ Year Member



Thanks Encyclo... I did that a few messages back just in case the bots didn't like the info.

I got this back from the host "This is because the folder is control. If you look for the same file in an uncontrolled folder you will get the 404 error. www.mydomain.com/images/robots.txt"

Anyway... this is starting to head towards a more technical nature about apache servers so I'll stop posting here...

Thanks again group.