Forum Moderators: phranque

Message Too Old, No Replies

rewriterule that passes complete url in querystring to another domain

         

webmaestro

5:50 pm on Oct 15, 2009 (gmt 0)

10+ Year Member



I need to create a rewrite rule to redirect/rewrite that redirects mobile browsers to the mobile article on a separate domain:

[domain.com...]

to this URL:

[m.domain.com...]

(m.domain.com uses the full article URI to serve a mobile version of that article)

Here's what I'm starting with:


RewriteCond %{HTTP_REFERER} !^m\..*$
RewriteCond %{HTTP_ACCEPT} text/vnd.wap.wml [NC,OR]
RewriteCond %{HTTP_USER_AGENT} \bUP[\/.]¦\bNokia¦\bMOT¦^LGE?\b¦SonyEricsson¦Ericsson¦BlackBerry¦Opera\ Mini¦iPhone¦iPod¦DoCoMo¦Symbian¦Windows\ CE¦NetFront¦Klondike¦PalmOS¦PalmSource¦portalmm¦S[CG]H-¦\bSAGEM¦SEC-¦jBrowser-WAP¦Mitsu¦Panasonic-¦SAMSUNG-¦Samsung-¦Sendo¦SHARP-¦Vodaphone¦BenQ¦iPAQ¦AvantGo¦Go.Web¦Sanyo-¦AUDIOVOX¦PG-¦CDM[-\d]¦^KDDI-¦^SIE-¦TSM[-\d]¦^KWC-¦WAP¦^KGT [NC]
RewriteCond %{REQUEST_URI} ^/[a-z]+/[^-]*-([0-9]+)-([^-]*)-[^-]*-[^-]*\.html
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,4}(\.[a-z]{2})?)
RewriteRule ^/(.*[^\.html])$ http://m.%2/?targetUrl=http://%1.%2/%3 [L,QSA,PT]
#RewriteRule ^/(.*[^\.html])$ http://m.domain.com?targetUrl=http://www.domain.com/$1 [L,QSA,PT]

I'd like to pass a dynamically generated domain, instead of a hard-coded domain.

I noticed this helpful post on WebmasterWorld.com:

[webmasterworld.com...]


RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,4}(\.[a-z]{2})?)

I take that to mean I can do something like this:


RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,4}(\.[a-z]{2})?)
RewriteRule ^/(.*[^\.html])$ http://m.%2/?targetUrl=http://%1.%2/$1 [L,QSA,PT]

(where $1 will be the %{REQUEST_URI} value)

Thank you!

(edited to replace %3 in RewriteRule with $1

jdMorgan

8:08 pm on Oct 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure what your specific question is...

I can comment that you need to get rid of the RewriteCond testing %{HTTP_REFERER}, though. The HTTP Referer header is not always going to be present in requests, since it is an optional header and is often dropped by mobile gateways, mobile transcoders, proxies, etc. Furthermore, it will never start with "m." as shown -- If present, it will always be a complete URL like "http://m.example.com/" at least. But it will often just be blank, so you should probably not make your code dependent on it.

Jim

webmaestro

6:43 pm on Oct 16, 2009 (gmt 0)

10+ Year Member



Thanks for the nte about %{HTTP_REFERER}... I'll keep that in mind.

For mobile browsers, I want to re-direct this URL:

[domain.com...]

to this URL:

[m.domain.com...]

Notice that the original URL becomes the value of the 'targetUrl' parameter in the querystring. If the targetUrl parameter is set to the full article URI, the m.domain.com can use it to serve a mobile version of that specific article.

Does this clarify?

jdMorgan

10:37 pm on Oct 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For mobile browsers, I want to re-direct
http://www.example.com/articles/my-123456-article-url.html to
[m.example.com...]

Getting rid of all of those extra phones and commenting-out the HTTP_ACCEPT header check to simplify testing, end-anchoring the HTTP_HOST pattern (by allowing for FQDN and appended port number) and fixing it to accept only valid domain names, but including ".museum" domains, re-arranging some parenthese-layers, moving the URL-path-pattern to the RewriteRule where it belongs (to improve efficiency), and removing an extra "-[^-]+" from the URL-path pattern in order to match your example quoted above, I get:


# RewriteCond %{HTTP_ACCEPT} text/vnd.wap.wml [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Nokia [NC]
RewriteCond %{HTTP_HOST} ^((www\.)?([a-z][a-z0-9\-]*[a-z0-9]\.[a-z]{2,6}(\.[a-z]{2})?))\.?(:[0-9]+)?$
RewriteRule ^/([a-z+]/[^-]+-([0-9]+)-([^-]+)-[^-]+\.html)$ http://m.%3/?targetUrl=http://%1/$1 [QSA,PT,L]

You can test this with a User-agent-spoofing browser add-on, or using an on-line UA-spoofing site. Keep the rule simple until you get it working, then add the HTTP_ACCEPT header check and a few more phone user-agents. Keep backups of the last working version! None of the RewriteConds that I compeletly removed were needed.

In short, simplify, divide, and conquer.

Be aware that to use PERL-compatible regular expressions (such as "\b"), you must be running on Apache 2.x or on a server with the PCRE library installed. If you're not sure, or if you might change servers, don't use PCRE (e.g. use [^0-9a-zA-Z_]+ ahead of the "word" instead).

Jim

webmaestro

2:50 pm on Oct 17, 2009 (gmt 0)

10+ Year Member



Thank you very much!

Thanks to your advice, I was able to tweak the meat of the code into what I wanted:


RewriteCond %{REQUEST_URI} ^/[a-z]+/[^-]*-([0-9]+)-([^-]*)-[^-]*-[^-]*\.html
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,4}(\.[a-z]{2})?)
RewriteRule ^/(.*[^\.html])$ http://m.%2/?targetUrl=http://%{HTTP_HOST}/$1 [L,QSA]

That note about PCRE is a real GEM! You deserve a lollipop (or a :beer: if you're in to that sort of thing!).

Cheers!

Clay

jdMorgan

4:24 pm on Oct 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want the best performance, put the URL-path pattern in the RewriteRule pattern, and don't add a redundant RewriteCond checking REQUEST_URI... RewriteConds are only evaluated if the RewriteRule pattern matches, so it is both inefficient and unnecessary (in most cases) to re-check the URL-path in a RewriteCond.

About the only times you need to do this are when you need an exclusion (using a negative-match pattern in the RewriteCond) and you also need to back-reference all or part of the URL-path; Since you cannot back-reference a negative-match pattern, you can't use one in the RewriteRule for exclusion, and it is therefore necessary to use a RewriteCond. But that does not seem to be the case here, so get rid of that RewriteCond by moving the full pattern to the RewriteRule.

The HTTP_HOST RewriteCond will be more efficient in some cases and less likely to fail by producing incorrect matches if you include the optional FQDN and port number match as I showed above. To be clear, "www.example.com.:80" is a perfectly-acceptable hostname, and likely "will work" to access your site. By including those optional subpatterns in the hostname pattern, you make the match on the domain name itself more efficient by "nailing down" the actual end of the domain in a precise way; If an FQDN or port number is appended to the request, this optional subpattern at the end ensures that it will be "thrown away immediately" and won't end up in your %2 back-reference.

Also, don't use the "*" quantifier in subpatterns unless you really want to accept "blank" as a match:


RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,6}(\.[a-z]{2})?)\.?(:[0-9]+)?$
RewriteRule ^/([a-z]+/[^-]*-[0-9]+-[^-]+-[^-]+-[^.]+\.html)$ http://m.%2/?targetUrl=http://%{HTTP_HOST}/$1 [QSA,L]

In addition, the subpattern "[^\.html]" means "any single character that is *not* a backslash, a period, an "h", a "t", an "m", or an "l" -- I doubt that's what you wanted in your rule pattern... Be aware that "[]" defines an alternate character group, that the meaning of "^" changes within a group, as do the character-escaping rules. See the regular-expressions tutorial cited in our Forum Charter for details.

Unless you have a functional problem with this new code, I would recommend using it 'exactly as-is' because it contains what I consider to be several performance and specificity improvements over the code you posted. What I posted is exactly what I'd use on my own servers if I had the same requirements as you do.

Jim

webmaestro

5:15 pm on Oct 17, 2009 (gmt 0)

10+ Year Member



Thank you very much!

Thanks to your advice, I was able to tweak the meat of the code into what I wanted:


RewriteCond %{REQUEST_URI} ^/[a-z]+/[^-]*-([0-9]+)-([^-]*)-[^-]*-[^-]*\.html
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z][a-z0-9\-]+\.[a-z]{2,4}(\.[a-z]{2})?)
RewriteRule ^/(.*[^\.html])$ http://m.%2/?targetUrl=http://%{HTTP_HOST}/$1 [L,QSA]

That note about PCRE is a real GEM! You deserve a lollipop (or a :beer: if you're in to that sort of thing!).

Cheers!

Clay

webmaestro

3:30 pm on Oct 26, 2009 (gmt 0)

10+ Year Member



Whoop! The RewriteRule flags at the end are backwards, causing the rule not to trigger correctly. The final rule ended up being:


RewriteCond %{REQUEST_URI} ^/[a-z]+/[^-]*-([0-9]+)-[^-]*-[^-]*\.html
RewriteCond %{HTTP_HOST} ^www\.?(.*)?
RewriteRule ^/(.*\.html)$ http://m.%1/site.htm?targetUrl=http://%{HTTP_HOST}/$1 [QSA,L]

(removed question... should be in new post.)