Forum Moderators: Robert Charlton & goodroi
Type a duff url for our site, and up pops the custom /404.html page every time.
Type a duff url for our site, and up pops the custom /404.html page every time.
Googlebot knew of the existence of /404.html because, as noted in my original post, it is listed in my robots.txt in order to Disallow robots from crawling the page and filling their index with useless pages.
If I made a new file name for our custom 404 page, I'd still list it in robots.txt, for the same reason we list the current one
robots.txt file: User-agent: *
Disallow: /folder1
Disallow: /folder2 User-agent: GoogleBot
Disallow: /folder3 /folder1 and /folder2 because Google reads only the User-agent: GoogleBot directive. meta robots noindex tag to it. The 404 page would also link out to significant sections and pages of the site.
I'm loathe to fiddle with the 404.html file name
robots.txt file, add the meta robots noindex tag to the page itself.
It seems Google now expect the true url of a custom 404 page to return a 404 response.
What lunacy is this?
It is generally OK to have a 302 redirect to the 404 page
662 byte custom 404 page
Some browsers reportedly have problems with custom error pages that are very small, consequently ignoring the custom error and getting the server default instead.
I happily bow to your greater knowledge, but I have occasionally seen in my shared hosting environment an overall server default error instead of my own custom one.
example.com/folder/ you get to see the content found inside the index.html file without seeing "/index.html" in the URL in the browser address bar. ErrorDocument http://www.example.com/404.html in the server configuration file. If you do that, asking for example.com/not-exist results in a 302 redirect to example.com/404.html which is then served with a 200 OK status code. This is documented in the Apache manual. The redirect is the main problem here, but following it with a 200 OK status doesn't help matters at all. ErrorDocument /404.html or whatever filename you choose. meta robots noindex directive in the 404 file should also stop that happening.