Forum Moderators: phranque
I've done my rewrite rules in my .htaccess file. It works as I expect it to work. However I'm looking for a way to make it more efficient.
Here's the scenario:
I have a form that can pass from 1 to 6 populated variables. If the variable is empty I don't display it in the url.
The variables always go in this order: 1/2/3/4/5/6 however one or more could be empty, and therefore not displayed. ie, 1/2/4/6
or
1/3/5
It can never be something like 1/6/3/4/5
Anyway, I handle the form processing and formatting the url to the way I want to in php. However the translation in the .htaccess rewrites seems excessive.
RewriteBase /
RewriteCond %{HTTP_HOST} !^localhost$
RewriteRule ^(.*)$ http://localhost/$1 [R=301]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4\&startdate=$5\&enddate=$6 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4\&startdate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&style=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&style=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?region=$1\&countries=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&countries=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?region=$1\&style=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/style/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&style=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?countries=$1\&style=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?style=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&countries=$2\&activitylevel=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/searchResults.php?countries=$1\&style=$2\&activitylevel=$3 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?style=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?activitylevel=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&style=$2\&activitylevel=$3 [L]
RewriteRule ^/*region/(.*)/style/(.*)/startdate/(.*) /searchResults.php?region=$1\&style=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?region=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?countries=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*countries/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*style/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?style=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/searchResults.php?region=$1\&countries=$2 [L]
RewriteRule ^/*countries/(.*)/style/(.*) /searchResults.php?countries=$1\&style=$2 [L]
RewriteRule ^/*activitylevel/(.*)/startdate/(.*) /searchResults.php?activitylevel=$1\&startdate=$2 [L]
RewriteRule ^/*startdate/(.*)/enddate/(.*) /searchResults.php?startdate=$1\&enddate=$2 [L]
RewriteRule ^/*region/(.*)/style/(.*) /searchResults.php?region=$1\&style=$2 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&activitylevel=$2 [L]
RewriteRule ^/*region/(.*)/startdate/(.*) /searchResults.php?region=$1\&startdate=$2 [L]
RewriteRule ^/*region/(.*)/enddate/(.*)/searchResults.php?region=$1\&enddate=$2 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*) /searchResults.php?countries=$1\&activitylevel=$2 [L]
RewriteRule ^/*countries/(.*)/startdate/(.*) /searchResults.php?countries=$1\&startdate=$2 [L]
RewriteRule ^/*style/(.*)/startdate/(.*) /searchResults.php?style=$1\&startdate=$2 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*) /searchResults.php?style=$1\&activitylevel=$2 [L]
RewriteRule ^/*activitylevel/(.*)/enddate/(.*) /searchResults.php?activitylevel=$1\&enddate=$2 [L]
RewriteRule ^/*region/(.*) /searchResults.php?region=$1 [L]
RewriteRule ^/*countries/(.*) /searchResults.php?countries=$1 [L]
RewriteRule ^/*style/(.*)/searchResults.php?style=$1 [L]
RewriteRule ^/*activitylevel/(.*)/searchResults.php?activitylevel=$1 [L]
RewriteRule ^/*startdate/(.*)/searchResults.php?startdate=$1 [L]
RewriteRule ^/*enddate/(.*)/searchResults.php?enddate=$1 [L]
RewriteRule ^/*tour/testimonials/(.*) /trip_testimonials.php?tourid=$1 [L]
RewriteRule ^/*tour/faqs/(.*) /trip_faqs.php?tourid=$1 [L]
RewriteRule ^/*tour/detailed_itinerary/(.*) /trip_detailed_itinerary.php?tourid=$1 [L]
RewriteRule ^/*tour/dates_prices/(.*) /trip_dates_prices.php?tourid=$1 [L]
RewriteRule ^/*tour/(.*) /trips.php?tourid=$1 [L]
I would absolutely LOVE to clean that up. If anybody has any suggestions it would be greatly appreciated.
Thank you for your time.
Cheers,
Gerry
[edited by: encyclo at 7:43 pm (utc) on July 3, 2009]
[edit reason] fixed typo [/edit]
That is, the first (.*) matches the whole of the URL. The parser then has to back off one character and see if that results in what is left being followed by a slash. If not back off and try again, one character at a time. Having eventually got what's supposed to be a match, it then finds another (.*) which has to be followed by another slash. At this point it realises that it has only originally backed up by one 'slash' worth, and so has to go back to the first (.*) pattern and back it up some more to find the one before the previous slash. Once that matches it proceeds to the second (.*) pattern and uses back up and retry until it finds the part that has a slash after. Next it proceeds to the third (.*) and finds it needs to have a slash after. Now it needs to go back to the very beginning (.*) and back it up again, and then discover the next (.*) get it wrong at back it up, and then proceed to the third (.*). On some patterns that will also be "wrong" and it will have to start from the beginning yet again... and so on.
Use a negative match pattern that "matches from the left" so that it can be parsed from left to right in one read operation. I suggest /([^/]+)/ which says "after the slash, keep reading and storing anything that is not a slash, and stop at the next slash". This will be very much faster to process.
The (.*) pattern is greedy and ambiguous. Never use more than one in any pattern. Always see if there is an alternative that can be used so that you can avoid (.*) completely.
.
You don't need to escape the ampersands when they are in the target "on the right".
Change [R=301] to be [R=301,L] instead (on the very first redirect).
Getting rid of the multiple ".*" subpatterns as advised by g1smd above will likely result in a noticeable improvement in your site's performance. There are a few more tricks you could use, but they can complicate on-going maintenance, so see how that goes first.
Note that since all or most of your rewriterule patterns are not end-anchored with a "$", *anything* can be present in the requested URL-path following the path-part that you are matching in your patterns. This represents a potential duplicate-content problem, and can be exploited to negatively affect your search ranking.
Jim
Thank you very much for your responses. Those are exactly the types of efficient changes I was hoping for.
I knew my multi-wildcard was 'just wrong' as far as creating any semblence of an efficient rewrite list, however my previous fixes always lead to breaky-breaky scenarios and then reverting back to this mess.
I can't wait to implement the suggested changes... will be waiting until Monday though.
Some super-quick questions:
- changing * to /([^/]+)/ will change the process to line-by-line comparison, correct? Whereas now it's wildcard-to-wildcard as well as line-by-line.
pseudo-process:
line 1 - doesn't match
line 2 - doesn't match
line 3 - doesn't match
line 4 - mathes. stop. done.
vs. the second paragraph process in g1smd's response.
- Jim, I'm sorry, but I don't completely follow your last paragraph. I'm going to read up more right now, so hopefully I'll be responding to my own post with the answer ( for others who may read it in the future, not for my schizophrenic self ).
If I use one of my rules with only two variables in it as an example, do you mean that the end anchor should be after each variable pattern or at the end of the url itself, with the url ending in a slash? I think I don't understand because I'm not grasping the potential duplicate content issue.
RewriteRule ^/*activitylevel/([^/]+)
/enddate/([^/]+)/?$
/searchResults.php?activitylevel=$1\&enddate=$2 [L]
ohhh... wait a second. Duplicate content issue would be:
/activitylevel/2/enddate/20090701
/activitylevel/2/enddate/20090701/something/here
correct?
hmmm... I see the potential, I don't think this is an issue in my case:
- form page
- form submits to form process page
- this takes form data and reformats url
- form process page sends user to /activitylevel/2/enddate/20090701
So as long as my form process page is correct and there's no other way for the user to hit the final result page from my site all is well. Unless somebody wanted to make up a url on their site and link to mine... like with the extra /something/here as above. Basically a valid url, but with extra crap at the end.
hmmmmm... all becoming clear. Figured I'd write out my thought process so others can see where I'm either correct or wrong, and why.
Thanks again you two!
Cheers,
Gerry
Exactly that scenario. Fix it so it cannot happen.
You really don't want to have a URL like this on your site that ranks highly; or at all, actually: /some-folder/some-product-name/this-product-is-junk-and-the-site-owner-is-a-fraudster.