Forum Moderators: phranque

Message Too Old, No Replies

Removal of 301 rewrite rule after search engine index

         

zorro

1:27 pm on May 20, 2010 (gmt 0)

10+ Year Member



This may sound like a daft question but to save on server resources would it be ok to remove external 301 rewritecond and rewriterules once search engines have picked the new ones up and indexed them, just leaving the internal rewriterules in place?

Example: All /file.php extensions have been rewritten to just /file

All references to .php extensions have had the .php part removed on all internal files/links etc

Currently google and other search engines have indexed the .php files

An External rewritecond and rewriterule are in place to tell search engines that files have permanently moved (301) to the none .php

THIS:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*\.php\ HTTP/
RewriteRule ^([^\.]+)\.php$ /$1 [R=301,L]

There is also an internal rewriterule which sends users (silently) from the none .php to the actual .php file (or if user were to just type domain name/file into browser).

THIS:
RewriteRule ^([^\.]+)$ $1.php [L]

Once search engines had re-indexed the files to show the none .php files, what would happen if the external 301 rewritecond and rewriterule was then taken out?
On further crawls wouldn't the spiders/bots see the code in the files which has had the .php removed and reference it as just /file ?
Or would the spider/bot also be internally redirected to the .php file (/file.php) thus search engines would start referencing the .php files again?

Like I say it may sound like a daft question to some!

jdMorgan

2:31 pm on May 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



These redirects need to be left in place for several years, as search engines have a very long memory, and will continue to request the old URLs for a long, long time. If they find that they no longer redirect, they will re-list them in search results.

This is one of the reasons that you often hear the advice to never, ever change a URL unless a lawsuit says you must.

If you have a bunch of these redirect rule-sets and their number is fixed and will not change, then consider using a 'skip rule' to avoid processing them unnecessarily. For example if there are 50 of them and all of them redirect requests for .php URLs, then adding the rule
 
RewriteRule !\.php$ - [S=50]

above them would cause them to be skipped over unless the requested URL-path ends in ".php".

Do not use this technique unless you are absolutely sure that the list of redirects is stable and will not change over time. If you add rules and forget to update the skip count, then the latter rules won't be skipped and performance will suffer. If you remove rules and forget to update the skip count, then rules will be skipped that should not in fact be skipped, and this may 'break' your site... Skip rules are handy, but can be a maintenance nightmare because the skip-count is a hard-coded number.

Note that you could also significantly speed up processing of the first rule you posted (and allow for periods in directory-paths) by using more-specific patterns:

# Externally redirect to remove ".php" from client-requested URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/\ ]*/)*[^.\ ]+\.php\ HTTP/
RewriteRule ^(([^/]*/)*[^\.]+)\.php$ /$1 [R=301,L]

This prevents potentially many, many "back-off-and-retry" matching attempts each time the pattern is evaluated, as matching on each sub-pattern is stopped as soon as possible, without having to match all the way to the end of the requested string and then back off one character at a time as would be required with a ".*" subpattern.

Jim

zorro

2:51 pm on May 20, 2010 (gmt 0)

10+ Year Member



Thanks JD!