Welcome to WebmasterWorld Guest from 54.144.79.200

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Preventing "soft 404 errors"

     

ChanandlerBong

7:12 pm on Jun 21, 2012 (gmt 0)

5+ Year Member



I have many cases of what GWT calls "soft 404s".

[support.google.com ]

In my htaccess, I 301 redirect many old incoming html links to php


RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]


this is perfect for old-page.html and other-old-page.shtml. But what about nonsense-url.html? On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently.

2 questions:

1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page?

2. How can I prevent nonsense URLs being 301'd to equally inexistent URLs. Am I asking too much of htaccess to be able to serve up a 301 redirect to URLs that exist, and a 404 for those that don't?

g1smd

7:30 pm on Jun 21, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The user sees the 404 page because the .php file does not exist. The user is first served a 301 status. This is not ideal.

If
/$1.php
is a real live actual physical file in the server filesystem, then you can set things in htaccess to redirect the .html request only if the .php exists and 404 if it does not.

Immediately before the redirecting rule add:
RewriteCond $1\.php -f 

or similar.

lucy24

3:55 am on Jun 22, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently.

301 is not an error; it's simply a response. Do you have an ErrorDocument line in your htaccess? What does it say?

:: business with crystal ball here ::

1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page?

Since a 301 is not an error-- though it might be the result of a mistake-- when would anyone ever see a 301 page if it existed? The essence of a Redirect, whether 302 or 301, is that you get Redirected to another actual page. If you get redirected to a nonexistent page, the 301 at the first location will be followed by a 404 at the second location.

RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]

That seems much more generic than it needs to be. It allows users to request any page with any of four extensions:
shtml
html
shtm
htm

Unless your site is so enormous that you simply have to cheat a little, you should only be redirecting from the form the URL really used to have. And I find it hard to believe that you used all four concurrently. Maybe different extensions in different directories?

g1smd

6:48 am on Jun 22, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Sometimes it is better to have a single compact RegEx ending in
\.s?html?
than to have separate rules.

The match for .shtm is unintented and caused by the simplicity of the pattern. Since this rule redirects, in practice that doesn't cause any issues.

ChanandlerBong

11:27 am on Jun 22, 2012 (gmt 0)

5+ Year Member



yes, unfortunately (oh how we'd all love a ticket on the hindsight express!), my site was html for a few early months, shtml for many years and now php

g1smd

1:35 pm on Jun 22, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you had changed to extensionless URLs when changing to PHP you would never need to add more redirects with each technology change, only amend internal rewrites to map the same old URLs to the new server internals.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month