Forum Moderators: bakedjake

Message Too Old, No Replies

Linux server returns 403 instead of 404 error....

         

kahuna

11:38 am on Mar 27, 2004 (gmt 0)

10+ Year Member



Hello group.
A Linux server is returning a 403 instead of 404 errors for the robot.txt etc files.
The operating system is: Linux 2.2.19-6.2.11
The web server is: Apache/1.3.9 (Unix) (Red Hat/Linux)

I am not personally running the server but the host company says the server is operating correctly, which it isn't, at least from these error reports I am mentioning.
This must be an easy fix - Could somebody give me some general knowledge to the fix so I can nudge the tech guy to correct the problem.
My biggest concern is that the search engines are being given the wrong "idea" about the robot.txt file.

Thanks group.

martin

2:10 pm on Mar 27, 2004 (gmt 0)

10+ Year Member



403 means forbidden, the file is either not readable by the webserver or it is configured not to server it via httpd.conf or .htaccess.

kahuna

3:07 pm on Mar 27, 2004 (gmt 0)

10+ Year Member



Thanks Martin...... I understand errors messages.
While talking to some folks it was mentioned...
"There was (maybe is) an Apache derived server returning 403 instead of 404.
This was a big enough problem for Google to switch their behaviour and crawl domains where /robots.txt returned 403.
That was some time ago, but if it's still the case then 403 for /robots.txt should be no problem. "

The host told me "This is because the folder is control. If you look for the same file in an uncontrolled folder you will get the 404 error. www.mydomain.com/images/robots.txt "

I uploaded a blank robots.txt file just in case.

Thank you very much group.
K.

VectorJ

3:05 am on Apr 3, 2004 (gmt 0)

10+ Year Member



The only reason I can think of for Apache to return a 403 for the /robots.txt is if the webhost has configured the httpd.conf file so that Apache is not allowed to return files ending in .txt.

I think an SE would ignore any robots.txt that it can't access and just crawl the site normally.