Virtual server site, all flat-file, static html.
For years I've had a simple 662 byte custom 404 page working fine with no problems, no funny business whatsoever. No meta refreshes, just a search box, and a few words.
Type a duff url for our site, and up pops the custom /404.html page every time.
Today, in our Google Webmaster Tools console, I noticed my first ever "Soft 404" (meaning a page that returns a 200 server response, instead of a genuine 404 page not found server response.) Just the one.
The page Google is showing in crawl errors as a "soft 404" is my custom /404.html page, thus;
Crawl errors: Soft-404
www.mysite.tld/404.html 404-like content May 11, 2011
The 404.html page is of course NOT linked-to anywhere on my site, and it has always had a meta name="robots" content="noindex, noarchive, nofollow" tag to prevent spiders including it.
In my root .htaccess file there's always been the directive:
ErrorDocument 404 /404.html
Additionally, I've always disallowed all bots, via robots.txt, from /404.html
User-agent: *
Disallow: /404.html
So Googlebot should never have crawled that page +directly+, but it did, here's the relevant log entry:
66.249.72.74 - - [11/May/2011:23:42:59 -0400] "GET /robots.txt HTTP/1.1" 200 1221 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.72.74 - - [11/May/2011:23:42:59 -0400] "GET /404.html HTTP/1.1" 200 662 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
It seems Google now expect the true url of a custom 404 page to return a 404 response.
What lunacy is this?