| 1:36 am on Mar 9, 2003 (gmt 0)|
Very likely that page has lots of inbound links carrying the search phrase. When you look at the cached version, does it say: These terms are only in links to this page?
| 1:39 am on Mar 9, 2003 (gmt 0)|
Indeed. "These terms only appear in links pointing to this page".
Still, why wouldn't google want to completely strip such pages from their results?
| 1:45 am on Mar 9, 2003 (gmt 0)|
I guess they would want to do so, but hey, nobody is perfect.
In any case that page occupying a top spot is good news for the competing pages - just imagine a rally good page were in that spot to run against.
| 1:51 am on Mar 9, 2003 (gmt 0)|
"It's even cached as such."
I doubt that google would cache a true 404 page. I have seen pages that look like 404 pages (the page contents say "page not found" and such), but they are not actually a 404 page because the http status for the page is 200.
I wonder if the http status for the page in question is 200 (ok) instead of 404 (not found). You can check that with:
| 1:56 am on Mar 9, 2003 (gmt 0)|
A common reason for this is an error in the ErrorDocument directive on an Apache server.
If the webmaster uses the directive in the form:
ErrorDocument 404 http;//www.example.com/my_error_page.html
the server will return a 302-Moved Temporarily status instead of a 404.
(See the warning about this in the Apache ErrorDocument documention.)
The correct format is:
ErrorDocument 404 /my_error_page.html
This is the first thing to check if one of your 404 pages appears in the index.
| 1:58 am on Mar 9, 2003 (gmt 0)|
You're right. It's returning a code of 200... which I suppose means that they're handling ALL pages at that site with some pre-processor or something, and if the page doesn't exist, they paint a 404ISH page. Odd.
I suppose it's better than an actual competitor site, but really I'd rather just be one higher, since that site doesn't exist!
BTW, answers within minutes. I love this place!
| 2:24 am on Mar 9, 2003 (gmt 0)|
Yes, it sounds like they have a general 404 processor, but neglected to set the http status properly. Most search engines will check for this condition and ban the site automatically within months. SE's generally don't like poorly implemented general 404 processors because they can so easy generate an infinite number of garbage pages and confuse the poor spiders.
The site will be banned even more quickly if they have no explicit robots.txt file. If that is the case, their general 404 processor will attempt to say 404 (not found), but, instead will say 200 (ok) with an VERY invalid robots.txt.
| 3:05 am on Mar 9, 2003 (gmt 0)|
I have one of my error pages listed in Google, despite it showing a true "404" on the SearchEngineWorld tool shown in the posts above.
What else may I be missing to allow these 404 pages into the index?
| 3:09 am on Mar 9, 2003 (gmt 0)|
Upon further examination my 404 webpage doesn't have the usual "404 Not Found" in the page's title... I assume this is the problem?
| 3:29 am on Mar 9, 2003 (gmt 0)|
No, they just look at the server response code. Yours, being correct, makes your situation pretty unique.
You may want to check for a "silent" redirect to your 404 page, for example, a mod_rewrite redirect without an [R] flag. If you're not on Apache, this is not applicable.
| 7:39 am on Mar 9, 2003 (gmt 0)|
Thanks, we do operate on Apache so I will definitely look into this.
| 7:58 am on Mar 9, 2003 (gmt 0)|
>I guess they would want to do so, but hey, nobody is perfect.
404's must cause immense frustration to many people, they think the site is down, when in fact it is Google that is causing the problem.