Forum Moderators: phranque

Message Too Old, No Replies

Need help with URL rewrite

         

LionMedia

9:29 am on Dec 5, 2009 (gmt 0)

10+ Year Member



Hi,
I'm new to url rewriting and would appreciate your help. I'm setting up a real estate site that displays over 40,000 listings. The software I'm using has several rewrite rules already configured to rewrite the URLS to the individual listing pages. For example

http://www.example.com/listing-2710-mountain-road-pasadena-maryland-mls-aa7172242-12921.html

I'm trying to create SE friendly URLs for custom links which currently might look like this:

http://www.example.com/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=ANNE+ARUNDEL
&city=ANNAPOLIS&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=Detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC

I'd like to rewrite this to look something like:

http://www.example.com/maryland/anne-arundel-county/annapolis/home-for-sale
-675000-1000000-detached-waterfront-cur_page1.html

So the word maryland is added as a directory, the word "county" is moved and that becomes a directory, the city value is a directory, only the variable name is used for "ForSale" and "waterfront" and the rest only write the values. Then drop the "action=....." and replace with "cur_page1.html"

Is this possible?

[edited by: jdMorgan at 6:04 pm (utc) on Dec. 5, 2009]
[edit reason] example.com & side-scroll [/edit]

jdMorgan

6:43 pm on Dec 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first thing to address is that you've apparently got the process "backwards" due to a common misunderstanding: mod_rewrite cannot be used "change" URLs.

The two main functions of mod_rewrite are to 'connect' an incoming client HTTP request to a non-default file on your server, and to redirect client requests for one URL to another URL. The code that came with your "software" is very likely implementing the first of these functions.

The main point is that mod_rewrite works as a request is received from a client (e.g. browser or search engine robot), internally rewriting the request or externally redirecting the client before any content is served or any scripts are invoked. It is a "server input process" and not any kind of "output-page content modifier."

So that means that in order to "change the URL," you must edit your static pages (if any) and modify either your database, your script, or both, so that the links that appear on your pages are in the form that you want human users and search engines to "see" -- The appearance of a URL in a link on an HTML page *defines* that URL.

The next step is to "re-connect" those new pretty URLs with the script on your server that will actually produce the requested page.

The entire process is described in this thread [webmasterworld.com] in our Apache Forum Library [webmasterworld.com].

However, before proceeding, you need to be aware that mod_rewrite is really inefficient at doing case-conversion (I should say "devastatingly and practically unusably inefficient" in order to make this point strongly enough). It's also impossible or practically impossible to do anything but convert the casing of the URL to anything but all-lowercase or all-uppercase.

So before investing too much time in this project, try testing your script by directly typing in a dynamic query with the exact same character case as that which you wish to use in the parameter-values of your friendly/pretty URLs. Failing that, try typing in the query in all-lower- or all-uppercase. If it works, you may be able to do this using mod_rewrite. If not, then you're going to need to buy an off-the-shelf "SEF" plugin for your software, or to code a much more complex scripted solution.

Jim

LionMedia

7:00 pm on Dec 5, 2009 (gmt 0)

10+ Year Member



Thanks Jim! I'll visit those links and educate myself a little more and give it a try.

Tracey

LionMedia

8:32 pm on Dec 5, 2009 (gmt 0)

10+ Year Member



Ok I started with something simple and my first attempts broke the entire site with a 500 server error. Now I'm getting a 404 just for the test link below so I'm hoping I'm getting close. Here's what I have so far:

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^pclass%5B%5D/([^/]+)/([^/]+)/([^/]+)/([^/]+) /index.php?pclass%5B%5D=$1&County=$2&city$3&action [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?pclass%5B%5D=([^&]+)&County=([^&]+)&city([^&]+)&action([^&]+)\ HTTP/
RewriteRule ^index\.php$ [marylandhomespro.com...] [R=301,L]

[mysite.com...]

Any suggestions as to where I'm going wrong?

[edited by: LionMedia at 8:52 pm (utc) on Dec. 5, 2009]

LionMedia

8:51 pm on Dec 5, 2009 (gmt 0)

10+ Year Member



This dynamic version of the link works

[mysite.com...]

g1smd

9:44 pm on Dec 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One comment on code style to make it easier for you to read the code many months later.

Add a blank line after each RewriteRule.

Add a # comment before the first RewriteCond of each such rule block, and describe exactly what that block does.

jdMorgan

9:53 pm on Dec 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sorry, but you need to re-read what I wrote, with close attention to terminology and details. Your rule is completely backwards. The devil is in the details, and with server config code like this, it's a very big devil...

On your pages, you must link to the URL http://www.example.com/maryland/anne-arundel-county
/annapolis/home-for-sale-675000-1000000-detached-waterfront-cur_page1.html

This URL is what a client -- a user or search engine will see, and this is the URL that that that client will request from your server.

Leaving aside the character-casing problem momentarily, in .htaccess, you will rewrite that URL, when requested by the client, to your script filepath at
/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=ANNE+ARUNDEL&city=ANNAPOLIS
&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=Detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC

After that works, and only after, should you attempt the third step in the thread I cited (i.e. your second rule).

This subject isn't simple, and unfortunately, there's a steep learning curve. Misconceptions and misunderstandings are quite common.

The character-casing 'test' I proposed is to manually request
http://www.example.com/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=anne+arundel
&city=annapolis&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC

Note that the casing of the query values corresponds not to that used in the current dynamic URL, but to the pretty, search-engine-friendly URL you wish to use. We are testing whether your script will accept it -- or just throw an error because it requires an exact match.

If that works, then the only obvious obstacle is getting your RewriteRule to replace "-" with "+" where required. This will be slow and inefficient, but nowhere nearly as bad as doing case-conversion on the entire URL-path.

If it doesn't work -- i.e. your script is case-sensitive -- then you're going to need a scripted approach with database access (and possibly database modifications) to solve this problem.

Jim

g1smd

10:15 pm on Dec 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



http://www.example.com/maryland/anne-arundel-county/annapolis/home-for-sale
-675000-1000000-detached-waterfront-cur_page1.html

I'd also like to add that a URL like this is waaay too long and has waaay too many hyphens in it. I think you're trying waaay too hard to stuff keywords in here; and so will Google.

I'd likely not want something any more complicated than:

http://www.example.com/maryland/anne-arundel/annapolis/675000-1.html
but I could see that your site would still have a huge taxonomy with most folders being almost empty. If the individual properties are listed in multiple categories, I'd have pages like
http://www.example.com/maryland/
listing counties, and pages like
http://www.example.com/maryland/anne-arundel
listing properties, but with the individual properties having URLs like
http://www.example.com/675000-1.html
, but that's a whole other discussion.

LionMedia

2:44 am on Dec 6, 2009 (gmt 0)

10+ Year Member



I'm not interested in long URLS stuffed with keywords. Just trying to figure out how this works within my site. Was hoping to get some examples of how I might do this using this particular example. Thanks anyway.

g1smd

9:13 am on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not interested in long URLS stuffed with keywords.

You should be, as it is both a usability issue as well as an SEO indexing and ranking issue.

Both of those will have an impact on the amount of traffic the site gets, and that will affect the bottom line.

LionMedia

2:00 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



Yes I realize that. I thought you meant I was trying to spam with too many keywords. I know that isn't good SEO.

So I studied this some more and followed the suggestions. I tested the dynamic link manually as Jim suggested with the parameters in lower case and this worked fine so I guess there is not issue with case here.

I then completed the first step as follows

# Enable mod_rewrite, start rewrite engine
Options +FollowSymLinks
RewriteEngine on
#
# Internally rewrite search engine friendly static URL to dynamic filepath and query
RewriteRule ^maryland-([^/]+)/([^/]+)/([^/]+)-([^/]+).html?$ /index.php?pclass[]=$1&County=$2&city=$3&action=searchresults&pclass[]=$4 [L]

This works when I test a static link like this:
http://www.example.com/maryland-1/howard/columbia-1

But when I tried the next step, I get a 500 server error

#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?pclass[]=([^&]+)&County=([^&]+)&city=([^&]+)&action=searchresults&pclass[]=([^&]+)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/maryland-%1/%2/%3-%4.html? [R=301,L]

I've read over the documentation and some posts but I'm still not sure where I'm going wrong.

jdMorgan

2:58 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It looks like you're on the right track here, and I'm glad to hear the script isn't case-sensitive -- I was afraid that that would be a total show-stopper.

The most likely problem in your second rule which could cause a 500 Error is that brackets "[]" are special characters in regular-expressions patterns, and are never empty when used for this regex function. So, in the parts of your pattern reading "pclass[]=" you will need to escape those brackets to make them literals. Use "pclass\[\]=" instead, and see if that helps. And note that this "pclass[]=" parameter appears twice in your query, which I can only say is "odd," if not incorrect and potentially problematic.

BTW, the first stop when getting a 500-Server Error is your raw server error log file -- It most often contains very good clues as to the cause of the problem.

Jim

LionMedia

4:12 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



That works perfectly! Thank you, thank you! I addressed the duplicate pclass issue and it seems to work find with one.

So now my next issue is this. These results pages produce multiple pages with Prev and Next links to page throught the results. The first static link now looks like this and works great:

http://www.example.com/maryland-1/howard/columbia.html

However, when I click Next, I get a 404 because the link now looks like this:

http://www.example.com/maryland-1/howard/index.php?cur_page=1&pclass[]=1&County=howard&city=columbia&action=searchresults&sortby=ListPrice&sorttype=ASC

I'm assuming i need additional rules to handle cur_page=1, cur_page=2, etc. It looks like it keeps everything up to the county parameter then the rest is the same type of dynamic string. So will my rewrite take the part starting with index.php? and do a similar rule that rewrites everything from there to make it look something like

example.com/maryland-1/howard/columbia-cur_page_2.html

jdMorgan

5:07 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ugh! -- Pagination, much frustration!

Yes, I think you can do what you propose. Hopefully, you won't run into the other bug-a-boo with pagination functions -- where the query string parameter order changes in several ways, and you have to cover all cases. This problem *can* be solved, but it can sometimes be a real pain in the hindquarters.

When making the friendly pagination URLs, keep them short; Don't include those underscores unless you really need them! (Most of us here on WebmasterWorld avoid underscores like the plague, because they ate not treated as 'word breaks' by most search engines, and because they 'hide' beneath the standard on-page link-underline and can't be differentiated visually from spaces.) I suspect you could shorten 'cur_page_2" all the way down to "pg2" without causing ambiguity problems in writing your rules.

When adding these new pagination-support rules, just remember: Put all external redirects first, in order from most-specific patterns and conditions to least-specific, followed by all internal rewrites, again in order from most- to least-specific.

Jim

LionMedia

5:33 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



Yes I've been reading other posts that pagination is an ongoing issue. I appreciate your guidance and will follow your suggestions and see what I can come up with.

Assuming that I'm successful, do I understand correctly that I will also have to change the HTML in the software script files so that the SE friendly URLs are invoked in the browser before the request hits the server? Otherwise there will be duplicate content issues?

Thanks Jim!

jdMorgan

5:59 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh yes, that's step #1 -- logically, if not actually implementation-wise! Your pages must link to the correct and canonical URLs.

The links on your pages *define* URLs. It matters not whether those URLs actually resolve to existing domains or to existing 'pages' on those domains -- The act of publishing a link is what 'creates' the URL, and it will get spidered by search engines whether it resolves to a 'file' or not, and whether you consider it to be 'optimized' or not.

It's often useful to think of URLs as independent entities, and quite necessary to think of them as something utterly different from "files." While this distinction may only become clear when working with mod rewrite or similar functions that change the default server URL-to-filename mapping, it is true even if you do not use mod_rewrite or similar. After all, the primary function of an HTTP server is to map requested URLs in the form "http://www.example.com/main-dir/foo.php" to server filepaths in the form required by that server's operating system -- say, "C:\Program Files\apache\var\users\my-site\public\html\main-dir\foo.php"

Even in the absence of any Webmaster-specified URL-rewriting. it's clear that these are two very-different "location-specification" methods; Only the very-last bit of those two strings has anything in common.

And that in fact is why URLs were invented: So that agents on the Web (and Webmasters creating links to other sites) don't have to know the internal file-structure of each and every server that they want to request resources from, and so that server administrators and Webmasters would be free to re-locate directories, re-arrange the entire internal server file structure, change the server software, and even change the entire operating system (e.g. change Windows to Linux or vice-versa) if necessary.

Jim

LionMedia

8:42 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



Hi Jim,
I stumbled on an addon module that was published recently and is designed to do exactly what I'm trying to do here. On the surface the demo looks like it does all of the functions I need. However, after studying the documentation here, I've learned that using this (.*) is very inefficient and can require heavy server resources. I've seen an example of the .htaccess file and here is a sample:

Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^properties-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^gites-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&cur_page=$7 [L]
Rewriterule ^properties-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^properties-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&cur_page=$7 [L]
Rewriterule ^rentals-upto-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-max=$1&date1=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&date1=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&date2=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&cur_page=$6 [L]

My question is...would it be reasonable to think I could modify these rules based on what I've learned here to make them more efficient and safe? Do you see any other issues that might damage rankings? One thing I noticed is there is no 301 redirect. The documentation says it does this so I need to find out how this is handle.

This would be a great solution if it's safe for the site.

g1smd

9:13 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I added those rules to my highly tuned server, I think they would bring it to a crawl.

Using

([^\-]+)
instead of
(.*)
or
(([^\-]+\-)+)
instead of
(.*)-
(the latter if a backreference could itself include a hyphenated value) would speed the code up many hundreds, possibly thousands, of times.

LionMedia

9:29 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



Hi g1smd
Yes that's what I thought based on my research on this forum. Other than that does it seem safe assuming there is proper handling of the 301 redirect?

Thanks for your help!
Tracey

jdMorgan

9:34 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just taking one rule out of that pile, you'd want to re-code
 Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&region=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L] 

-as-
 Rewriterule ^gites-in-([^\-]+)-from-([^\-]+)-pricemax-([^\-]+)-dates-start-([^\-]+)-dates-end-([^\-]+)-sleeping-([^\-]+)-with-swimming-([^\-]+)-page([0-9]*)\.htm[b]l$[/b] index.php?action=addon_sefTiger_searchresults&region=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L] 

but if and only if each of those "parameter fields" does not itself *contain* a hyphen. If any one (or more) of them must be allowed to contain one or more hyphens (e.g. Anne-Arundel and Prince-Georges as opposed to Charles or Montgomery), then the subpattern for that URL-field would become "([^\-]+(-[^\-]+)*)" and the back-reference number in the substitution path for each field following this one would need to be incremented by one.

However, there's a caveat: The highest back-reference number allowed in mod_rewrite is $9, so this could become problematic if more than one field in this example URL needed to be allowed to contain one or more hyphens. In that case, the rule would need to be broken into two or more 'steps', each allowing up to nine back-references. The code can get ugly pretty fast in that case, and it can be more complex than it might sound, but it's do-able.

Note also that none of the patterns in those rules appears to be end-anchored. This only exacerbates the inefficiency of their code. I added the bolded end-anchor in my modified example.

Just taking a guess, I'd say the simple pattern modifications shown would speed up the rule processing by a factor of at least 2000 -- probably more, but I don't feel like counting the characters and subpatterns and working out the required factorials...

I would suspect that the 301 redirects are being done in plug-in script itself. Look for code that outputs a response Status of 301 and a Location header.

One more thing: It appears that this code was copied at some point through an "HTML transcription" that has 'broken' the patterns and URLs for "&region" and turned them into "<registered-tradmark-symbol>gion". Just beware of this if you're copying the code to modify it.

Jim

LionMedia

9:56 pm on Dec 6, 2009 (gmt 0)

10+ Year Member



Jim,
Thanks so much for your detailed response. This forum is a fanastic resource! I've learned so much here about mod rewrite (still quite the newbie of course!) I think I'm up for the challenge but I'll read over your post again to make sure I have a handle on what will be involved. I think you're correct about the 301 redirect but I will certainly confirm that before jumping in.

Tracey

jdMorgan

10:32 pm on Dec 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. Although this thread got off to a somewhat-rocky start, it's obvious from the rules that you modified and posted -- and from the fact that you even asked the question about pattern efficiency, that you've got an eye for the level of detail that this stuff requires. So things are looking much better... :)

The "multiple-(.*) pattern" problem is probably the leading cause of otherwise-unnecessary server upgrades. I can't begin to imagine how many folks are forced to upgrade to high-end dedicated servers simply because they never realized how inefficient that multiple-(.*) technique can be.

Of course, if one has never written code intended to parse character strings, then the problem isn't obvious. But it requires a geometrically-exploding number of matching attempts as the number of "(.*)" subpatterns grows, and the number of operations required within each matching attempt increases with the length of the input string from the current matching position to the end. Then you replace the "(.*)" patterns with "([^\-])" or "([^/]+)" as needed, and the matching engine then only has to make one single left-to-right matching attempt. Often a stunningly-huge difference in performance results -- I know personally of one case where the server took over five seconds to serve a request with "(.*)" patterns (in hundreds of rules). When the subpatterns were 'corrected,' the server response time dropped to 'negligible.'

But WordPress, Joomla, and many plug-in vendors have plodded on, releasing "recommended" code like that. Luckily, it seems that some of them are catching on. After you've successfully tested your modified rules, you should inform the plug-in maker of your improvements (and ask for a heavy discount or even a freebie on the plug-in in exchange). :)

Jim