Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Effect of Missing Pages NOT Returning a 404 Status?

         

austtr

11:09 pm on Nov 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Joomla’s documentation recommends a procedure to handle not found pages. It works OK in the sense that the viewer is served the not found custom page.

However, a HTTP header status check on deleted pages shows two responses, the first being a 302 Moved Temporarily (the deleted page URL) and a 200 OK (the custom not found URL). The process does not generate a 404 response.

If deleted page URL’s are not generating 404’s, won’t that effect the accuracy of Google's crawling and indexing of the site?

Will Google think the pages are still valid and continue to include them in the index, possibly causing duplicate content and SEO havoc?

not2easy

1:09 am on Nov 15, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I would be more concerned about generating a lot of "soft 404s" rather than worrying about the accuracy of crawling and indexing or duplicate content. The best practice is if a page is gone, it should return a 404 header, no matter what page is actually served. Google talks about Soft 404s here: [support.google.com...]

If you use an .htaccess file you can declare your error documents there. The visitor will see the same nice page, but the header can be "404".

austtr

2:57 am on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The best practice is if a page is gone, it should return a 404 header, no matter what page is actually served


That is what I understood to be the case. I have always used error statements in the .htaccess in the past and only stumbled across this missing 404 scenario by accident when trying to discover why a whole folder of deleted pages are still showing in the index. Common denominator.... none of the pages return a 404 status code so I'm guessing that is why the SE's are keeping the pages in the index.

lucy24

6:51 am on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



none of the pages return a 404 status code

This is a pretty general issue with any CMS that works by sending all requests to /index.php and then assembling the page from databases. An ErrorDocument directive is meaningless because, as far as the server is concerned, everything is a 200: the request has been successfully handed off to the file "index.php", which exists.

Now, if the CMS is capable of displaying a "not found" page, it should certainly be capable of sending out a 404 response to go with that page. (Note that the response the visitor receives is not necessarily the same as the response the server records internally. It took me a couple of years to wrap my brain around this fact.) I stress should be capable, which may not be the same as does in fact. Check the settings/preferences/options carefully.

a whole folder of deleted pages

Pages that used to exist are a whole nother issue. Whether CMS or hand-rolled HTML, the server doesn't know that the page used to be there. To convey this message you need to return an explicit 410. It's done in exactly the same way no matter how the site is constructed. Assuming Apache:
RewriteRule ^name-of-deleted-directory - [G]
before the part of your htaccess that is supplied by the CMS.

A further benefit to the explicit 410 is that google (specifically) will stop crawling a lot faster, because this response can only be returned intentionally: not "sorry, can't find it" but "it used to exist but I've removed it".