|Preventing "soft 404 errors"|
| 7:12 pm on Jun 21, 2012 (gmt 0)|
I have many cases of what GWT calls "soft 404s".
In my htaccess, I 301 redirect many old incoming html links to php
RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]
this is perfect for old-page.html and other-old-page.shtml. But what about nonsense-url.html? On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently.
1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page?
2. How can I prevent nonsense URLs being 301'd to equally inexistent URLs. Am I asking too much of htaccess to be able to serve up a 301 redirect to URLs that exist, and a 404 for those that don't?
| 7:30 pm on Jun 21, 2012 (gmt 0)|
The user sees the 404 page because the .php file does not exist. The user is first served a 301 status. This is not ideal.
/$1.php is a real live actual physical file in the server filesystem, then you can set things in htaccess to redirect the .html request only if the .php exists and 404 if it does not.
Immediately before the redirecting rule add:
RewriteCond $1\.php -f
| 3:55 am on Jun 22, 2012 (gmt 0)|
|On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently. |
301 is not an error; it's simply a response. Do you have an ErrorDocument line in your htaccess? What does it say?
:: business with crystal ball here ::
|1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page? |
Since a 301 is not an error-- though it might be the result of a mistake-- when would anyone ever see a 301 page if it existed? The essence of a Redirect, whether 302 or 301, is that you get Redirected to another actual page. If you get redirected to a nonexistent page, the 301 at the first location will be followed by a 404 at the second location.
|RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L] |
That seems much more generic than it needs to be. It allows users to request any page with any of four extensions:
Unless your site is so enormous that you simply have to cheat a little, you should only be redirecting from the form the URL really used to have. And I find it hard to believe that you used all four concurrently. Maybe different extensions in different directories?
| 6:48 am on Jun 22, 2012 (gmt 0)|
Sometimes it is better to have a single compact RegEx ending in
\.s?html? than to have separate rules.
The match for .shtm is unintented and caused by the simplicity of the pattern. Since this rule redirects, in practice that doesn't cause any issues.
| 11:27 am on Jun 22, 2012 (gmt 0)|
yes, unfortunately (oh how we'd all love a ticket on the hindsight express!), my site was html for a few early months, shtml for many years and now php
| 1:35 pm on Jun 22, 2012 (gmt 0)|
If you had changed to extensionless URLs when changing to PHP you would never need to add more redirects with each technology change, only amend internal rewrites to map the same old URLs to the new server internals.