Forum Moderators: phranque

Message Too Old, No Replies

Compressing Rewrite Rules

Is there a pattern here that can be used?

         

jlander

7:22 pm on Apr 14, 2010 (gmt 0)

10+ Year Member



Hello,

First, let me say that I'm just learning how to write my own rewrite rules and am quite happy at what I've accomplished so far. Reading this forum has been a BIG help. Thank you all...

What I have written is working fine, but I have several of these sets which is taking up many lines of code. I was hoping that there might be a pattern that could be taken advantage of to condense these down to only a few lines.

I am changing from an ASP shopping cart to a PHP shopping cart. In order to not lose all of my search engine listings, I am going to use my old URL structure for a while until my new pages are cached with my old URLs.

Here is what I've written:
[size=2]
#Category Rewrites

#Jewelry
RewriteCond %{QUERY_STRING} ^pg=1$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=0 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=2$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=1 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=3$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=2 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=4$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=3 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=5$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=4 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=6$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=5 [NC,L]
RewriteCond %{QUERY_STRING} ^pg=7$ [NC]
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=6 [NC,L]
RewriteRule ^jewelry.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=0 [NC,L]
[/size]

Here is an explanation because of the way webmaster world displays whole URLs.
[size=2]
url.asp?pg=1 rewrites to a=viewCat&catId=1&page=0
url.asp?pg=2 rewrites to a=viewCat&catId=1&page=1
url.asp?pg=2 rewrites to a=viewCat&catId=1&page=2
etc..etc...until
url.asp rewrites to a=viewCat&catId=1&page=0
[/size]


I have 12 categories that I do this for. Each category has its own .asp page with the category name...example (jewelry.asp, gems,asp, pearls.asp, rings.asp, earrings.asp, etc). The last rewrite rule takes care of the first page in the category that does not have any dynamic content. I've included the first rule in the series with the 'pg=1' because it is a valid page even though it is not used.

[edited by: jdMorgan at 10:29 pm (utc) on Apr 14, 2010]
[edit reason] Please use example.com only. [/edit]

g1smd

7:44 pm on Apr 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use the power of regular expressions to reduce this to two rules per category.

Your current scheme calls for a rewrite, but by including a domain name you have actually created a 302 redirect instead. That's the worst possible place for you to be.

Your current scheme has two different URLs (no page number and pg=1) that can return the "page 0" content. That is also generally a bad idea.

I'd suggest:

Rewrite numbered-page URL request to script
RewriteCond %{QUERY_STRING} ^pg=[b]([0-9]+)[/b]$
RewriteRule ^jewelry\.asp$ /index.php?_a=viewCat&catId=1&page=[b]%1[/b] [L]


Rewrite URL request without page number to page 0 in script
RewriteCond %{QUERY_STRING} [b]!.[/b]
RewriteRule ^jewelry\.asp$ /index.php?_a=viewCat&catId=1&page=[b]0[/b] [L]


Remove the [NC] flag. You do not want to rewrite any case URLs to serve content. That would create a Duplicate Content issue.

jdMorgan

10:46 pm on Apr 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is nasty, because original "pg" and new "page" numbers are off by one. Since mod_rewrite is not a programming or scripting language, it has no "maths" capabilities.

You could use a lookup table method, so at least you're down to one rule (but a lot of RewriteConds). One good thing is that no RewriteConds are processed if the RewriteRule pattern does not match (See Apache mod_rewrite docs for details). So this pile of RewriteConds won't be processed for any requests except for those for the "jewelry.asp" page.

#Jewelry
RewriteCond %{QUERY_STRING}>0 ^>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>0 ^pg=1>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>1 ^pg=2>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>2 ^pg=3>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>3 ^pg=4>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>4 ^pg=5>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>5 ^pg=6>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>6 ^pg=7>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>7 ^pg=8>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>8 ^pg=9>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>9 ^pg=10>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>10 ^pg=11>(.+)$ [OR]
...
RewriteCond %{QUERY_STRING}>98 ^pg=99>(.+)$ [OR]
RewriteCond %{QUERY_STRING}>99 ^pg=100>(.+)$
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&[b]page=%1[/b] [NC,L]

Note that the ">" character is arbitrary, and is used as a "soft anchor" or delimiter between the query string value and the 'substitution number' in each RewriteCond pattern. It allows mod_rewrite to find the boundary between the number that was in the query string and the number hard-coded on the left side of the RewriteCond as the replacement number, in other words.

Jim

g1smd

11:18 pm on Apr 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I ignored the 'off by one' in the original question, and instead decided to have one to one mapping of page numbers and database entries.

jdMorgan

11:22 pm on Apr 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If that is an option, it will certainly make this rule simpler. (!)

Jim

g1smd

12:14 am on Apr 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Only the OP can know the answer to that. :)

As ever, it's all in the requirements specification.

jlander

12:29 am on Apr 15, 2010 (gmt 0)

10+ Year Member



g1smd & jdMorgan

Thanks for your help. I tried g1smd's rules before you posted yours jdMorgan and his are working. I was really wondering how the numbers being off by one could be handled, if at all. I removed the domain name from ALL of my rewrite rules. Good advice simply because it shrank the size of my .htaccess file considerably. Even more important is the 302 problem; however, I ran most of the pages through a server header checker and they returned 200 OK. Were they 302 redirects anyway? Maybe I used the wrong header checker...

As far as the /jewelry.asp & /jewelry.asp?pg=1 both being rewritten to the same page, /index.php?_a=viewCat&catId=1&page=0, that is the way the old cart worked. It used to need the ?pg=1 for the first page. Then to try to clean up some of the dynamic URLs, the cart used both for a while. That was bad. Later, they got it fixed so the cart never used the ?pg=1 anywhere for the first page; however, they left it valid. That is one of the many reasons I'm switching carts.

Thanks for your help. I now have ALL of my rewrite and redirect rules written. Now I have to find a way to change the cart links. I'm looking at a script that promises Magic SEO URL. I'm wondering if it really works. It is supposed to be only one line of code inserted in the cart files.

jlander

12:49 am on Apr 15, 2010 (gmt 0)

10+ Year Member



I didn't realize that it was not mapping correctly. I'll have to think about it after I get the internal links switched over; however, I don't think it will be an issue. Since I'm learning as I go, I may be wrong, but I don't think the internal links will be affected by the rewrite rules in .htaccess.

After all, as soon as the new pages are cached with the old URLs, I'm going to switch to a new, more descriptive URL structure anyway.

Since I've got nothing better to do right now, I think I'll give jdMorgan's rules a try.

jdMorgan

1:15 am on Apr 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's be clear, then. If you have a choice, then map pg=1 to page=1 and pg=2 to page=2, and not pg=1 to page=0 and pg=2 to page=1.

This one-to-one correspondence will eliminate the need for that big pile of RewriteConds, and the rule becomes a lot simpler (and faster, and more comprehensive, and easier to maintain...) :

#Jewelry
RewriteCond %{QUERY_STRING}>0 ^>(0)$ [OR]
RewriteCond %{QUERY_STRING} ^pg=([1-9][0-9]*)$
RewriteRule ^jewelry\.asp$ http://www.example.com/index.php?_a=viewCat&catId=1&page=%1 [NC,L]

Jim

g1smd

7:25 am on Apr 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you meant to post...
#Jewelry
RewriteCond %{QUERY_STRING}>0 ^>(0)$ [OR]
RewriteCond %{QUERY_STRING} ^pg=([1-9][0-9]*)$
RewriteRule ^jewelry\.asp$ /index.php?_a=viewCat&catId=1&page=%1 [L]

jdMorgan

12:46 pm on Apr 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I missed the redirect vs. rewrite issue (several times).
That's why it's great to have multiple contributors here... Thanks! :)

Jim

jlander

3:19 pm on Apr 15, 2010 (gmt 0)

10+ Year Member



Thanks for all your help. FYI, both sets of rewrite rules work find. BTW, now I know the difference between a redirect and a rewrite...finally.

jdMorgan

3:41 pm on Apr 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> BTW, now I know the difference between a redirect and a rewrite...finally.

That's a good thing. Many Webmasters use the terms interchangeably and do not understand the difference. And since they don't understand that a redirect "tells the client" to ask again for what it wanted, but using a new URL, they often get into serious ranking and URL-listing difficulties in search engines because they use an external client URL-to-URL redirect when a URL-to-internal-server-filepath rewrite would be much more appropriate.

Jim

jlander

2:20 pm on Apr 23, 2010 (gmt 0)

10+ Year Member



jdMoran & g1smd,

I'm having a new issue, or at least an undiscovered one I didn't account for originally.

I'm using:


    #Jewelry - CatId=1
    RewriteCond %{QUERY_STRING}>0 ^>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>0 ^pg=1>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>1 ^pg=2>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>2 ^pg=3>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>3 ^pg=4>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>4 ^pg=5>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>5 ^pg=6>(.+)$ [OR]
    RewriteCond %{QUERY_STRING}>6 ^pg=7>(.+)$
    RewriteRule ^jewelry\.asp$ /index.php?_a=viewCat&catId=1&page=%1 [NC,L]


Everything works fine when used with the old query strings. What I failed to account for is that the category page has javascript that allows the results to be sorted and/or also allows the visitor to drilled down into its sub categories.

Here is an example. When I request www.example.com/jewelry.asp I get to the category page fine. If you try to sort the product, for instance by price, you get 404-Page Not Found with this:

    http://www.example.com/jewelry.asp?_a=viewCat&catId=1&page=0&sort_by=price&sort_order=low&showOnly=1


The URL looks like it should work because the query string is correct. It works fine with the non-rewritten URL here:

    http://www.example.com/jewelry/cat_1.html?_a=viewCat&catId=1&page=1&sort_by=price&sort_order=low&showOnly=1


Same goes for drilling down into a sub category:

    http://www.example.com/jewelry.asp?_a=viewCat&catId=1&page=0&sort_by=price&sort_order=low&showOnly=1


is 404-Page Not Found. The working URL is:

    http://www.example.com/jewelry/cat_1.html?_a=viewCat&catId=1&page=0&sort_by=name&sort_order=low&showOnly=6


Why does it show a correct looking URL, but not show the correct page? Can it be fixed?

jdMorgan

7:07 pm on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure what the problem might be, but it's clear that none of these "sort" URLs match this rule, nor would they match the rule even if the rule was modified to accept (and handle) additional name/value pairs. This is because the rule looks for only and exactly "pg=nnn" query strings, while the "sort" URLs use "page=nnn" parameters (among others).

So the bottom line is that this problem doesn't have anything to do with this rule.

Either you need another rule (which does not yet exist), or a different existing rule needs some work.

Jim