homepage Welcome to WebmasterWorld Guest from 54.237.134.62
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Preventing "soft 404 errors"
ChanandlerBong




msg:4468099
 7:12 pm on Jun 21, 2012 (gmt 0)

I have many cases of what GWT calls "soft 404s".

[support.google.com ]

In my htaccess, I 301 redirect many old incoming html links to php


RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]


this is perfect for old-page.html and other-old-page.shtml. But what about nonsense-url.html? On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently.

2 questions:

1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page?

2. How can I prevent nonsense URLs being 301'd to equally inexistent URLs. Am I asking too much of htaccess to be able to serve up a 301 redirect to URLs that exist, and a 404 for those that don't?

 

g1smd




msg:4468103
 7:30 pm on Jun 21, 2012 (gmt 0)

The user sees the 404 page because the .php file does not exist. The user is first served a 301 status. This is not ideal.

If
/$1.php is a real live actual physical file in the server filesystem, then you can set things in htaccess to redirect the .html request only if the .php exists and 404 if it does not.

Immediately before the redirecting rule add:
RewriteCond $1\.php -f
or similar.

lucy24




msg:4468217
 3:55 am on Jun 22, 2012 (gmt 0)

On my site, a user will get the 404 page, but the http error code is 301, Moved Permanently.

301 is not an error; it's simply a response. Do you have an ErrorDocument line in your htaccess? What does it say?

:: business with crystal ball here ::

1. why does a user see a custom 404 page? Is it because I haven't defined a 301 customised page?

Since a 301 is not an error-- though it might be the result of a mistake-- when would anyone ever see a 301 page if it existed? The essence of a Redirect, whether 302 or 301, is that you get Redirected to another actual page. If you get redirected to a nonexistent page, the 301 at the first location will be followed by a 404 at the second location.

RewriteRule ^(([^/]+/)*[^/.]+)\.s?html?$ http://www.example.com/$1.php [R=301,L]

That seems much more generic than it needs to be. It allows users to request any page with any of four extensions:
shtml
html
shtm
htm

Unless your site is so enormous that you simply have to cheat a little, you should only be redirecting from the form the URL really used to have. And I find it hard to believe that you used all four concurrently. Maybe different extensions in different directories?

g1smd




msg:4468241
 6:48 am on Jun 22, 2012 (gmt 0)

Sometimes it is better to have a single compact RegEx ending in
\.s?html? than to have separate rules.

The match for .shtm is unintented and caused by the simplicity of the pattern. Since this rule redirects, in practice that doesn't cause any issues.

ChanandlerBong




msg:4468292
 11:27 am on Jun 22, 2012 (gmt 0)

yes, unfortunately (oh how we'd all love a ticket on the hindsight express!), my site was html for a few early months, shtml for many years and now php

g1smd




msg:4468333
 1:35 pm on Jun 22, 2012 (gmt 0)

If you had changed to extensionless URLs when changing to PHP you would never need to add more redirects with each technology change, only amend internal rewrites to map the same old URLs to the new server internals.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved