|Effects of using missing.html|
does google see this as a duplicate page?
I posted a similar question a while back on one of the threads, but never got a response, so let's try again. I occasionaly remove pages or change the url of some deep internal pages. When I do this I notice that the log files show 404's for them, and that people are still trying to access them, though there are no internal links pointing in their direction. I finally found that google, and other SE's had indexed these pages and that they are showing up pretty well for the relative key phrases. A few of these pages recently started getting a lot of hits from googler's (is that the term?). Anyway, I didn't want to waste this traffic, and I know that google does not like re-directs, so I decided to place a missing.html file in my root www folder.
My question is... the missing.html shows whenever the server is unable to find the requested page, it acts as a sort of custom 404. When a user accesses my site domain.com/nosuchpage.html, it displays the missing page, which is a copy of the home page, but it displays the non existant url in the address bar. Since there are several pages indexed and showing up as search results, will google see these pages and consider that they are duplicate pages? If so, will this cause PR problems for the rest of my site?
What about the possabilities of creating a dynamic missing page, so that it would be different everytime it's accessed?
Any info on the problems or benefits of using missing.html would be helpfull.
I will recommend that you still return a 404 error for the missing document even though you also return some content that you would like the user to see. This will inform the user agents that they are not seeing the page that they requested, that that page has been removed, and that you are providing some other content. Google will take this into account.
How do I know if I am returning this 404 error. My admin told me that I could customize my 404 by loading a file named missing.html and that this page would display. I'm wodering if this is just a custom 404 or if it's something that will get me in trouble? How can I tell?
While I'm not sure how to check that your server is returning missing.html as a correct 404 (something about checking headers?), this situation sounds exactly like one of my domains. My missing.html actually is served up as a custom 404 page (which I know because I set up the .htaccess myself), but also exists as a seperate page in it's own right.
I've found that because the page "missing.html" itself isn't linked to from any other page then google just doesn't find it. It has been sitting in my account for about five months, has never been found by googlebot and isn't listed anywhere. If you're really worried you may also be able to block that page using robots.txt. I figure that there's never any reason for search engines to find my 404 page anyway as it's not real content and this removes any duplicate content problem.
As far as I know, google /and other SE/, do not index 404 pages, so I think if you set up your missing.html as custom 404, your non-existant pages will drop from the database soon or later.
You can check your server response for missing.html using the Server Header Checker here on WebmasterWorld. If it shows a 404 response code, then you'll be just fine with Google.
Look under Control Panel (at the top left of your screen), then click on the Server Headers link in the left-side nav bar on that page. Type in the URL to a non-existent page on your site, and check the resulting response code.
Here is a direct link to the Header tool [webmasterworld.com].
If your page only moved and wasn't deleted, you shouldn't 404 it, you should 302 (Document Moved) it.
What application is your web server? (IIS, apache,?)
Thanks everyone for the info, It is saying it's a 404, which is how I thought it was set up:
HTTP/1.1 404 Not Found
Date: Fri, 18 Oct 2002 04:26:34 GMT
Server: Apache/1.3.26 (Unix) FrontPage/188.8.131.52 mod_fastcgi/2.2.12
Last-Modified: Thu, 10 Oct 2002 06:11:19 GMT
Thanks for the help, this makes me feel more comfortable, since it does contain a duplicate content as another actual page of my site.
For now, if Googlebot sees an
HTTP/1.1 404 Not Found
[or "HTTP/1.0 404 I Am Tired", it's "404" the important kw]
on the header of server response, it will stop in reading.
[At least, that is what I've understood watching my logs.
But don't trust this too much.
Maybe tomorrow Googlebot will go ahead, reading also the page content ;)]
|My question is... the missing.html shows whenever the server is unable to find the requested page, it acts as a sort of custom 404. |
I'm assuming you're running apache; you haven't said, so forgive me otherwise.
You're quite right here. If you don't have an ErrorDocument statement in your .htaccess, then the Host has probably set this up in the server config files.
If it might make you feel better, you can have a per/directory missing.html (Actually it doesn't have to be missing.html, it can be anything you want to call it)
Just set up a separate .htaccess on each directory if you want to customize it (because if they requested that directory, the content might be more relevant)
Example, you have in yourdomain.com folders named dir1 and dir2:
So in the .htaccess in /dir1
ErrorDocument 404 custommissing1.html
and in .htaccess of /dir2
ErrorDocument 404 custommissing2.html
The custommissing.html's can be absolute if you'd like.
The advantage of doing this is that the user is 'closer' to the content that he/she was looking for, especially if you have a large site. I'd always include a link back to the home page, then one to a site map (if you have it), and then the most relevant/or starting points for the pages in the directory requested.