Forum Moderators: phranque

Message Too Old, No Replies

RewriteCond Backreference issue

         

moroandrea

3:28 pm on Nov 12, 2010 (gmt 0)

10+ Year Member



I don't remember where, but I read the RewriteCond Backreference is valid for the first rule only.

This could be a problem for a group of rules that reguire to access to the backreference and I was wondering if there is a way to avoid such limit.

As I believe the answer is no, what is more performant between the following two options:

a) try to group the Rewritecondition as much as possibile in a similar fashion

RewriteCond %{HTTP_HOST} ^(www\.)?test\.eu(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^abc/?$ [OR]
RewriteCond $1 ^xyz/?$
RewriteRule ...

b) Have a RewriteCond per each RewriteRule that require the backreference?

Many thanks

g1smd

4:48 pm on Nov 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That depends. In many cases there are other ways to code the rest of the rule, so avoiding the problem altogether.

For example

RewriteCond $1 ^abc/?$ [OR]
RewriteCond $1 ^xyz/?$


can be coded as

RewriteCond $1 ^(abc|xyz)/?$


In many cases, you won't even need the pattern to be in a separate RewriteCond, it might be better off as the main pattern within the RewriteRule line itself.

moroandrea

5:17 pm on Nov 12, 2010 (gmt 0)

10+ Year Member



Hi G1smd

thanks for the answer as usual

well the patterns I have are something like

abc/store/sock/size-10
abc/store/sock/size-11
abc/store/sock/size-12

which I would like to redirecto to the new socks folder

I guess I could use both the ways, but I guess in terms of readability the first example is the first one.

Is there a limit to group them with the or (pipe)?

g1smd

5:27 pm on Nov 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If there are always three slashes in the URL, and you need to individually pull each slash-delimited part out and re-use some of them, you can use ([^/]+)/ to find all characters up to the next slash, repeating that pattern the same number of times that you have slashes in the URL.
The [^X] operator is "NOT X"; and the + operator means "one or more times".

If it's just the size changing, then ^abc/store/sock/size-([0-9]{1,2})$ will find all requests that end with exactly one or two digits.

The patterns and their coding depend on EXACTLY what you want to do, defined down to the individual character level for all URLs that you wish to deal with.

moroandrea

10:16 am on Nov 13, 2010 (gmt 0)

10+ Year Member



Well, mine was an example ...

I don't have the full list of the URL with me, but they are something like this

/store/categorylist/clothing/dresses/*
/store/department/clothing
/store/clothing

This is the general pattern, but in some case it could be slightly different like

/store/categorylist/boys/dresses/*
/store/categorylist/girls/dresses/*
/store/categorylist/children/dresses/*
/store/department/dresses/children
/store/department/dresses/boys
/store/department/dresses/girls

It is a real nightmare!

g1smd

10:49 am on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, the first thing to do is make a list of EVERY URL format (every format, not every URL) found on the site, including calls for images, stylesheets and scripts. Also include searchengine verification files, robots.txt, and so on.

Armed with that list of formats, they can then be grouped by number of slashes, URLs that end with digits, and so on. The idea is that one rule can deal with thousands of URLs of one "type" instead of having hundreds of similar rules, and the whole site might need only a few dozen (at most) rules to deal with everything.

Getting a list of URL formats is the crucial step, and happens long before any thoughts about coding solutions. Before you can code the solution, you have to know what the actual, detailed, explicit, question was, down to individual character level.

Here's part of one for a site a few years ago:

products:
/p<7 digits>-<multiple words with hyphens>
section pages, multiple pages per section:
/s<4 digits>-p<3 digits with padded leading zeroes>-<multiple words with hyphens>
FAQ pages:
/f<6 digits>-<multiple words with hyphens>
reviews, multiple pages per product:
/r<7 digits>-p<3 digits with padded leading zeroes>-<multiple words with hyphens>
"words" always lower case, apostrophes and other punctuation removed (not replaced)

and so on. The site used a lot of URL rewriting, but presented a very simple URL structure to the outside world.

moroandrea

4:56 pm on Nov 13, 2010 (gmt 0)

10+ Year Member



I guess that as my destination URLs are about 70, I can group all the 3k URLs by that factor and have in the worse scenario 70 RewriteCond with many OR condition.

As the first part of the URL is generally the same I can try to simplify at the maximum.

Something like the following

/store/categorylist/boys/dresses/*
/store/categorylist/girls/dresses/*
/store/categorylist/children/dresses/*
/store/department/dresses/children
/store/department/dresses/boys
/store/department/dresses/girls

suppossing dressess is the category I'm interested in, may be transformed in

RewriteCond $1 ^*[dresses]?\/?(boys|girls|children)\/[dresses]?*$

Is this correct?

jdMorgan

11:52 pm on Nov 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Be very aware that [] and () are two totally different things. I believe that you intended to use () and not [] -- but see the regular-expressions tutorial cited in out Forum Charter for more info.

We also need to know what you intend to do with the URLs in the above URL-type-list. As g1smd points out, you cannot productively start coding until the problem is fully-defined. All I can offer is a simple [OR]ed list to detect all of those types, although the disposition remains in question:

RewriteCond %{HTTP_HOST} ^(www\.)?test\.eu(\.|\.?:[0-9]+)?$ [NC]
RewriteCond $1 ^categorylist/(girls|boys|children)/dresses/ [OR]
RewriteCond $1 ^department/dresses/(girls|boys|children)$
RewriteRule ^store/(.+)$ --unknown-disposition--

Jim