Forum Moderators: goodroi
Today "The Contractor" has been looking into our problem of missing pages and spotted the problem with the robots.txt and the fact that it was giving a "403 Forbidden". I then removed the file and then found that when we looked at oursite.com/robots.txt it was still returning the dreaded 403 still. I contacted the so-called hosting company who say:
We have found the root of your cause. You are getting 403 Forbidden error for robots.txt because at times the search engines cache are not cleared and also the ISPs cache are not cleared. They get refreshed periodically, so we need to wait till then. Once the cache gets cleared you will receive appropriate 404 File not found error since you have deleted robots.txt file as you had mentioned earlier.
Can anyone help me, this is just beyond beleif now.
Also try deleting robots.txt all together and see if it returns a 404 when you try to access it.
On Apache servers, this is typically done in one of two files, or possibly in both. The files are httpd.conf - the server configuration file, and .htaccess - a user-level configuration file that can exist in any or all of your directories.
In those files, the Deny from or RewriteRule directives can be used to block access to various files based on requestor IP address, remote hostname, http_referer and other parameters. You should check to see if you have an .htaccess file that is unintentionally blocking these accesses.
On MS servers similar funtionality is available using ISAPI Filters and the control panel.
Jim
On Apache servers, this is typically done in one of two files, or possibly in both. The files are httpd.conf - the server configuration file, and .htaccess - a user-level configuration file that can exist in any or all of your directories.
The domain in question is being hosted by a rather large host. I checked other domains that are hosted on that server, and their robots.txt isn't giving a 403. Is it safe to strike the httpd.conf as a possibility?