Forum Moderators: phranque

Message Too Old, No Replies

RewriteRule problems

How can I rewriterule for unknown number of forward slashes?

         

patrickng01

2:54 am on Mar 24, 2005 (gmt 0)

10+ Year Member



Hi,

Hope someone can guide on this :

I have an url like this :

[#*$!.xxx.com.hk...]
or
[xxx.xxx.com.hk...]

I like the final redirected url to be :

[xxx.xxx.com.hk...]
or
[xxx.xxx.com.hk...]

Notice that /sg/ has been substituted with /hk/. Notice the number of forward slashes after /sg/ can be different.

RewriteCond %{HTTP_HOST} ^xxx\.xxx\.com\.(.+) [NC]
RewriteRule /cms/export/(.+)/(/¦,¦[a-z]¦[A-Z]¦?¦\.*) /cms/export/%1/$2 [R=301,L]

I managed to substitute the /sg/ with /hk/ but I just can't seem to substiture the entire string (including unknown number of slashes) after /sg/ to end of line.
Its always only the string after the last slash (ie. index.html) gets substituted.

Can someone show me the right way?

thanks very much

jdMorgan

3:18 am on Mar 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



patrickng01,

Welcome to WebmasterWorld!

It looks like your pattern was too specific -- and too complicated.

I modified this using negative-lookahead matches. The pattern will match anything up to the specified character. For example, in the RewriteCond, ([^:]+) will match your TLD up to a colon preceding an optional port number, i.e. example.com.uk:80

In the RewriteRule, the first variable pattern [^/]+ matches anything up to the next slash that it finds. Because this value (sg) is not used in a back-reference and contains no alterates (A¦B), there is no need to enclose it in parentheses.
The next variable pattern ([^.+]) matches everything between "sg/" and the first period that it finds, which should be the start of ".html". This is then back-referenced as $1.


RewriteCond %{HTTP_HOST} ^www\.example\.com\.([^:]+) [NC]
RewriteRule /cms/export/[^/]+/([^.]+)\.html$ /cms/export/%1/$1.html [R=301,L]

If this isn't quite right for your URLs, you can probably adjust it to suit.

Jim

sitz

3:25 am on Mar 24, 2005 (gmt 0)

10+ Year Member



While you correctly use the [NC] flag in your RewriteCond, the case-insensitive flag doesn't store the data in %1 backreference in lower-case; it stores it how it was sent. You'll need to make use of mod_rewrite's internal 'tolower' RewriteMap to convert that string to lower-case. Also note that if you're /always/ going to replace 'sg' with 'hk', you don't need to worry about the RewriteCond line.You could just do this:

RewriteEngine on
RewriteRule ^/(abc/export)/sg/(.*) /$1/hk/$2 [L,R=301]

If the 'sg/hk' part of the directory structure is going to be based soley on the top-level domain of the Host: header, then something like this:

RewriteEngine on
RewriteMap tolower int:tolower
RewriteCond %{HTTP_HOST}!^$
RewriteCond %{HTTP_HOST} \.([a-z]+)$ [NC]
RewriteRule ^/(abc/export)/sg/(.*) /$1/${tolower:%1}/$2 [L,R=301]

Note that the problem here is that you're not guaranteed to get a Host: header from a browser. You *should*, but not all HTTP clients send them. If you wanted to send requests which didn't send a Host header to some default area, you could do this:

RewriteEngine on
RewriteMap tolower int:tolower
RewriteCond %{HTTP_HOST} ^$
RewriteRule ^(.*) /defaultpage.html?$1 [L]

RewriteCond %{HTTP_HOST} \.([a-z]+)$ [NC]
RewriteRule ^/(abc/export)/sg/(.*) /$1/${tolower:%1}/$2 [L,R=301]

You could then have 'defaultpage.html' (or defaultpage.php, or what have you) return a friendly error, possibly using the originally requested path in the error text.

Does all that make sense, more or less?

sitz

12:08 am on Mar 25, 2005 (gmt 0)

10+ Year Member



Damn, jd. I need to start checking the threads after I type up something but *before* I post? Sheesh. =)

jdMorgan

2:30 am on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good points, all, though...

Two points of related discussion:

1) A check for a blank host is usually not needed with a positive hostname pattern-match. It's required, though, for a negative hostname pattern-match.

That is, if you rewrite/redirect only when (hostname = somevalue), then a blank hostname won't ever accidentally match a non-blank pattern. However, if you redirect when (hostname NOT = somevalue), then a blank hostname from an HTTP/1.0 client will likely put you into an "infinite" redirection loop unless you add the RewriteCond to explicitly check for a non-blank hostname.

2) A very minor one: I like to use "." instead of "!^$" for non-blank-matching. It's two characters shorter, plus then you don't have to do a work-around to avoid this forum removing the space between "}" and "!" when posting. :)

Jim

sitz

2:58 pm on Mar 25, 2005 (gmt 0)

10+ Year Member



I like to use "." instead of "!^$" for non-blank-matching. It's two characters shorter, plus then you don't have to do a work-around to avoid this forum removing the space between "}" and "!" when posting. :)

Yeah, I'd noticed that. I prefer the "!^$" syntax because (to my eyes, anyway) it's a bit more obvious what it's doing. The '.' I'd have to think about for a second the first couple of times I saw it. But that may just be me. As for the latter point, true. Although I'd argue that it was a bug in the forumcode, since the <pre> </pre> tags are supposed to Do The Right Thing(tm) with whitespace. *points* at the forum source code. Go! Fix! =D

patrickng01

8:07 am on Mar 28, 2005 (gmt 0)

10+ Year Member



A very big THANK YOU to both of you for the replies.

I followed your regular expression but keep getting Redirection limit reached for it. I don't understand why it keeps going into a loop (despite putting in L flag).

Below is the portion of the httpd.conf

RewriteCond %{REQUEST_METHOD} ^(TRACE¦TRACK)
RewriteRule .* - [F,L]

RewriteCond %{HTTP_HOST} ^www\.#*$!\.com\.([^:]+) [NC]
RewriteRule /cms/export/[^/]+/([^.]+)\.html$ /cms/export/%1/$1.html [R=301,L
]
RewriteRule /web-console.*$ - [F,L]
RewriteRule /jmx-console.*$ - [F,L]

thanks a million

sitz

10:42 pm on Mar 28, 2005 (gmt 0)

10+ Year Member



Is this box production? If not (or if it does *VERY* little traffic right now), you should be able to crank up RewriteLogging and see what's going on:

RewriteLog /path/to/rewrite.log
RewriteLogLevel 9

...and bounce apache. If the output doesn't make sense (or it does, but there's some question about a fix), paste the log data into this thread so's we can take a gander. =)

jdMorgan

6:08 am on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That code can easily loop, because the output URL can match the RewriteRule pattern.

Three possibly-easy fixes:

First, you could change the host


RewriteCond %{HTTP_HOST} ^www\.example\.com\.([^:]+) [NC]
RewriteRule /cms/export/[^/]+/([^.]+)\.html$ http://www.example.[b]com/cms[/b]/export/%1/$1.html [R=301,L]

In this way the rewritten URL fails to match the RewriteCond, so it would be re-redirected.

Or, you can change the URL-path:


RewriteCond %{HTTP_HOST} ^www\.example\.com\.([^:]+) [NC]
RewriteRule /cms/export/[^/]+/([^.]+)\.html$ http://%{HTTP_HOST}/cms/expor[b]ts/[/b]%1/$1.html [R=301,L]

This is just an arbitrary example; you just need something to distiguish the old URL from the new URL. So, you could change the "cms" part, or change the file type to ".htm", or add another subdirectory level to the new URL -- anything so it can't ever be the same as the original URL.

Or, if the original URL *always* has "sq" in it, then explicitly test for that:


RewriteCond %{HTTP_HOST} ^www\.example\.com\.([^:]+) [NC]
RewriteRule /cms/export/[b]sq[/b]/([^.]+)\.html$ http://%{HTTP_HOST}/cms/export/%1/$1.html [R=301,L]

In this way, only requests for the "sq" URLs get redirected.

Jim

patrickng01

6:48 am on Mar 29, 2005 (gmt 0)

10+ Year Member



Hi,

Your suggestion works. But the user still wants me to maintain the original url pattern. But I circumvent the problem by using

Alias /cms/exports to point back to the cms/export directory and point the Rewrite Rule to /cms/exports instead.

thanks a million