Forum Moderators: phranque

Message Too Old, No Replies

optimizing my .htaccess

the .htaccess does what I want it to, but it's not fast

         

XsTatiC

3:56 am on Jul 3, 2009 (gmt 0)

10+ Year Member



Alright, my messy code is kind of embarrassing to even post, but here goes.

I've done my rewrite rules in my .htaccess file. It works as I expect it to work. However I'm looking for a way to make it more efficient.

Here's the scenario:

I have a form that can pass from 1 to 6 populated variables. If the variable is empty I don't display it in the url.

The variables always go in this order: 1/2/3/4/5/6 however one or more could be empty, and therefore not displayed. ie, 1/2/4/6
or
1/3/5
It can never be something like 1/6/3/4/5

Anyway, I handle the form processing and formatting the url to the way I want to in php. However the translation in the .htaccess rewrites seems excessive.

RewriteBase /
RewriteCond %{HTTP_HOST} !^localhost$
RewriteRule ^(.*)$ http://localhost/$1 [R=301]

RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4\&startdate=$5\&enddate=$6 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4\&startdate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&style=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&style=$2\&activitylevel=$3\&startdate=$4\&enddate=$5 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&activitylevel=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?region=$1\&countries=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&countries=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?region=$1\&style=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*region/(.*)/style/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&style=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*)/searchResults.php?region=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?countries=$1\&style=$2\&activitylevel=$3\&startdate=$4 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?style=$1\&activitylevel=$2\&startdate=$3\&enddate=$4 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&countries=$2\&activitylevel=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/startdate/(.*) /searchResults.php?region=$1\&countries=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/enddate/(.*) /searchResults.php?region=$1\&countries=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/style/(.*) /searchResults.php?region=$1\&countries=$2\&style=$3 [L]
RewriteRule ^/*countries/(.*)/style/(.*)/activitylevel/(.*)/searchResults.php?countries=$1\&style=$2\&activitylevel=$3 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*)/startdate/(.*)/searchResults.php?style=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*activitylevel/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?activitylevel=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/style/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&style=$2\&activitylevel=$3 [L]
RewriteRule ^/*region/(.*)/style/(.*)/startdate/(.*) /searchResults.php?region=$1\&style=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?region=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*region/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?region=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*)/startdate/(.*) /searchResults.php?countries=$1\&activitylevel=$2\&startdate=$3 [L]
RewriteRule ^/*countries/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?countries=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*style/(.*)/startdate/(.*)/enddate/(.*) /searchResults.php?style=$1\&startdate=$2\&enddate=$3 [L]
RewriteRule ^/*region/(.*)/countries/(.*)/searchResults.php?region=$1\&countries=$2 [L]
RewriteRule ^/*countries/(.*)/style/(.*) /searchResults.php?countries=$1\&style=$2 [L]
RewriteRule ^/*activitylevel/(.*)/startdate/(.*) /searchResults.php?activitylevel=$1\&startdate=$2 [L]
RewriteRule ^/*startdate/(.*)/enddate/(.*) /searchResults.php?startdate=$1\&enddate=$2 [L]
RewriteRule ^/*region/(.*)/style/(.*) /searchResults.php?region=$1\&style=$2 [L]
RewriteRule ^/*region/(.*)/activitylevel/(.*) /searchResults.php?region=$1\&activitylevel=$2 [L]
RewriteRule ^/*region/(.*)/startdate/(.*) /searchResults.php?region=$1\&startdate=$2 [L]
RewriteRule ^/*region/(.*)/enddate/(.*)/searchResults.php?region=$1\&enddate=$2 [L]
RewriteRule ^/*countries/(.*)/activitylevel/(.*) /searchResults.php?countries=$1\&activitylevel=$2 [L]
RewriteRule ^/*countries/(.*)/startdate/(.*) /searchResults.php?countries=$1\&startdate=$2 [L]
RewriteRule ^/*style/(.*)/startdate/(.*) /searchResults.php?style=$1\&startdate=$2 [L]
RewriteRule ^/*style/(.*)/activitylevel/(.*) /searchResults.php?style=$1\&activitylevel=$2 [L]
RewriteRule ^/*activitylevel/(.*)/enddate/(.*) /searchResults.php?activitylevel=$1\&enddate=$2 [L]
RewriteRule ^/*region/(.*) /searchResults.php?region=$1 [L]
RewriteRule ^/*countries/(.*) /searchResults.php?countries=$1 [L]
RewriteRule ^/*style/(.*)/searchResults.php?style=$1 [L]
RewriteRule ^/*activitylevel/(.*)/searchResults.php?activitylevel=$1 [L]
RewriteRule ^/*startdate/(.*)/searchResults.php?startdate=$1 [L]
RewriteRule ^/*enddate/(.*)/searchResults.php?enddate=$1 [L]

RewriteRule ^/*tour/testimonials/(.*) /trip_testimonials.php?tourid=$1 [L]
RewriteRule ^/*tour/faqs/(.*) /trip_faqs.php?tourid=$1 [L]
RewriteRule ^/*tour/detailed_itinerary/(.*) /trip_detailed_itinerary.php?tourid=$1 [L]
RewriteRule ^/*tour/dates_prices/(.*) /trip_dates_prices.php?tourid=$1 [L]
RewriteRule ^/*tour/(.*) /trips.php?tourid=$1 [L]

I would absolutely LOVE to clean that up. If anybody has any suggestions it would be greatly appreciated.

Thank you for your time.

Cheers,

Gerry

[edited by: encyclo at 7:43 pm (utc) on July 3, 2009]
[edit reason] fixed typo [/edit]

g1smd

7:52 am on Jul 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first thing to change is the multiple (.*) patterns, as they are very inefficient and will be responsible for thousands of 'back off and retry' operations for each of the rules.

That is, the first (.*) matches the whole of the URL. The parser then has to back off one character and see if that results in what is left being followed by a slash. If not back off and try again, one character at a time. Having eventually got what's supposed to be a match, it then finds another (.*) which has to be followed by another slash. At this point it realises that it has only originally backed up by one 'slash' worth, and so has to go back to the first (.*) pattern and back it up some more to find the one before the previous slash. Once that matches it proceeds to the second (.*) pattern and uses back up and retry until it finds the part that has a slash after. Next it proceeds to the third (.*) and finds it needs to have a slash after. Now it needs to go back to the very beginning (.*) and back it up again, and then discover the next (.*) get it wrong at back it up, and then proceed to the third (.*). On some patterns that will also be "wrong" and it will have to start from the beginning yet again... and so on.

Use a negative match pattern that "matches from the left" so that it can be parsed from left to right in one read operation. I suggest /([^/]+)/ which says "after the slash, keep reading and storing anything that is not a slash, and stop at the next slash". This will be very much faster to process.

The (.*) pattern is greedy and ambiguous. Never use more than one in any pattern. Always see if there is an alternative that can be used so that you can avoid (.*) completely.

.

You don't need to escape the ampersands when they are in the target "on the right".

Change [R=301] to be [R=301,L] instead (on the very first redirect).

jdMorgan

4:34 pm on Jul 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can also delete the "/*" subpattern at the beginning of every rule pattern. This subpattern means, "match zero or more slashes." Since the leading slash will never be present in the URL-path examined by rewriterules in .htaccess, looking for these slashes is a waste of time.

Getting rid of the multiple ".*" subpatterns as advised by g1smd above will likely result in a noticeable improvement in your site's performance. There are a few more tricks you could use, but they can complicate on-going maintenance, so see how that goes first.

Note that since all or most of your rewriterule patterns are not end-anchored with a "$", *anything* can be present in the requested URL-path following the path-part that you are matching in your patterns. This represents a potential duplicate-content problem, and can be exploited to negatively affect your search ranking.

Jim

XsTatiC

5:16 pm on Jul 3, 2009 (gmt 0)

10+ Year Member



g1smd & jdMorgan,

Thank you very much for your responses. Those are exactly the types of efficient changes I was hoping for.

I knew my multi-wildcard was 'just wrong' as far as creating any semblence of an efficient rewrite list, however my previous fixes always lead to breaky-breaky scenarios and then reverting back to this mess.

I can't wait to implement the suggested changes... will be waiting until Monday though.

Some super-quick questions:
- changing * to /([^/]+)/ will change the process to line-by-line comparison, correct? Whereas now it's wildcard-to-wildcard as well as line-by-line.
pseudo-process:
line 1 - doesn't match
line 2 - doesn't match
line 3 - doesn't match
line 4 - mathes. stop. done.
vs. the second paragraph process in g1smd's response.

- Jim, I'm sorry, but I don't completely follow your last paragraph. I'm going to read up more right now, so hopefully I'll be responding to my own post with the answer ( for others who may read it in the future, not for my schizophrenic self ).

If I use one of my rules with only two variables in it as an example, do you mean that the end anchor should be after each variable pattern or at the end of the url itself, with the url ending in a slash? I think I don't understand because I'm not grasping the potential duplicate content issue.

RewriteRule ^/*activitylevel/([^/]+)
/enddate/([^/]+)/?$
/searchResults.php?activitylevel=$1\&enddate=$2 [L]

ohhh... wait a second. Duplicate content issue would be:

/activitylevel/2/enddate/20090701
/activitylevel/2/enddate/20090701/something/here

correct?

hmmm... I see the potential, I don't think this is an issue in my case:
- form page
- form submits to form process page
- this takes form data and reformats url
- form process page sends user to /activitylevel/2/enddate/20090701

So as long as my form process page is correct and there's no other way for the user to hit the final result page from my site all is well. Unless somebody wanted to make up a url on their site and link to mine... like with the extra /something/here as above. Basically a valid url, but with extra crap at the end.

hmmmmm... all becoming clear. Figured I'd write out my thought process so others can see where I'm either correct or wrong, and why.

Thanks again you two!

Cheers,

Gerry

g1smd

7:32 pm on Jul 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** Unless somebody wanted to make up a url on their site and link to mine... like with the extra /something/here as above. Basically a valid url, but with extra crap at the end. ***

Exactly that scenario. Fix it so it cannot happen.

You really don't want to have a URL like this on your site that ranks highly; or at all, actually: /some-folder/some-product-name/this-product-is-junk-and-the-site-owner-is-a-fraudster.