Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- Are 301 redirects to 404 error pages detrimental to rankings?


g1smd - 8:44 am on Dec 12, 2012 (gmt 0)


htaccess rules execute way before anything happens with your PHP scripting.

When a site consists of individual .html files for each page, htaccess can check which URL requests will be successful and the rules can be crafted to not redirect when there will be no file to fetch at the end of the process.

When the site content is actually in a database and a single index.php file is responsible for generating the pages htaccess cannot possibly know whether any particular URL request will ultimately be successful.

In this case, you have to move the redirecting functions from htaccess rules to PHP scripting. This will slow the site down a little because mod_rewrite is blisteringly efficient and PHP is not so fast.

So to proceeed, you rewrite all requests for pages (this is where it is really helpful that the site uses extensionless URLs for pages and only requests for files have an extension) to the index.php file. The PHP script looks at the URL and extracts the page name part. It then looks in the database to see if it can fulfil the request. If it cannot, then irrespective of whether the URL request was www or non-www, or was index.php/ prefixed or not, the PHP script sends HEADER 404 and "includes" the error404 file so that the user sees an error message. If the URL request can be fulfilled with content from the database, the PHP script looks at the URL request and if the www is missing or there's an index.php/ part present it instead sends HEADER 301 and redirects the browser to make a new request for the correct URL. If the URL request can be fulfilled from the database and the URL is of the correct form then the PHP script assembles the HTML page and sends it to the browser.

htaccess is a pre-processor that can sort out a whole load of stuff before the PHP kicks in. You can dispense with htaccess and move most of that functionality to PHP but it will be a lot less efficient - partly because you're invoking the PHP engine instead of the fast mod_rewrite process, but mostly because you're hitting the database for valid and for non-valid requests and especially that for non-valid requests that will ultimately be successful when the URL has been tidied, you're hitting the database twice. The database access is the slowest part of the system.

You'll still need non-www to www redirecting for image requests, but it's not so important for stylesheets and javascript files. You set up the standard non-www to www redirect in htaccess but instead of (.*) for the pattern, which matches "all" requests, you set the non-www to www redirecting rule so that only requests with an extension (that's why using extensionless URLs for pages really helps here!) are redirected. You still have the problem that for images that don't exist, there will be a redirect when non-www is requested and the 404 will be served only when www is requested. You get round this by using a preceding RewriteCond and the -f test. This drops the server efficiency even more because htaccess will have to hit the filesystem to see if a file exists. Luckily this happens only when the request is for non-www. Make sure the -f test is NOT the first test in the non-www/www redirecting rule, test the HTTP_HOST variable first. If you accidentally put the -f test first, the filesystem will be read for www requests too, only for the next test (HTTP_HOST) to then say that this entire rule can be skipped.

So, to answer the question: everything you envisage can be done but will be less efficient in terms of server overhead and speed of response. You will have to make the decision as to whether a slower site for all requests (valid or otherwise) or an unwanted redirect before 404 (for non-www and/or index.php/ requests) is the way to go.


Thread source:: http://www.webmasterworld.com/apache/4527236.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com