| 6:16 pm on Apr 2, 2011 (gmt 0)|
You need a RewriteRule to do the rewrite, noting that the RewriteRule RegEx pattern can see only the path part of the URL request.
You also need a preceding RewriteCond looking at the QUERY_STRING, and there are several thousand such examples in the WebmasterWorld Apache forum.
| 6:27 pm on Apr 2, 2011 (gmt 0)|
Need to check there is a query string - i get that.
Not sure how to match the index.php? with the ?
Does ? have a special significance in rewrite rules ?
| 6:31 pm on Apr 2, 2011 (gmt 0)|
Yes, ? does a special use.
Can I use \? ?
When you say 'can only see the path part of the url, you mean up to but not including the query string?
| 7:02 pm on Apr 2, 2011 (gmt 0)|
Yes, RewriteRule sees only the path, not the hostname or query string.
A RewriteCond must be used to detect protocol, domain name, port number, or query string data (one RewriteCond for each).
| 7:25 pm on Apr 2, 2011 (gmt 0)|
Almost there :)
Now have :
RewriteRule ^index.php/(.*)index.php?(.*) /beijing/index.php?$2 [R]
This detects the strange URLs fine but redirects to
The last part : ?$2 is not showing up :(
| 7:35 pm on Apr 2, 2011 (gmt 0)|
Got it :)
RewriteRule ^index.php/(.*)index.php /beijing/index.php [R]
| 7:41 pm on Apr 2, 2011 (gmt 0)|
Swap (.*) for ([^/]+/)+ to quickly recurse folder levels. Escape literal periods in patterns using \. instead of . here.
The [R] produces a 302 redirect. Change to [R=301,L] and add the protocol and domain name to the rule target.
| 12:47 am on Apr 3, 2011 (gmt 0)|
Good ideas g1smd.
But in my case the pattern is always index.php/[various number of directories]/index.php?a=b So what I have works fine
I did try escaping the .s but it didn't work like that. Don't know why, what a . means or why what I have does work, but it does :)
I'll check on the 302/301 and what L means.
| 12:57 am on Apr 3, 2011 (gmt 0)|
"Note: if you add an "L" flag to the mix; meaning "Last Rule", e.g. [R=302,NC,L]; Apache will stop processing rules for this request at that point, which may or may not be what you want. Either way, it's useful to know."
I already made it the last rule - it is last in the rewite rules list. I am hoping I have fixed the reason why my Joomla is causing the bot to find/create these URLs so it will activate only as a last resort.
I'll let the 302 run for a while first until I'm sure all is well (by checking WMT).
| 1:22 am on Apr 3, 2011 (gmt 0)|
The . matches ANY character and \. matches only a literal period.
The .* forces thousand of "back off and retry" trial matches. Use ([^/]+/)+ to parse the URL once from left to right, very much faster.
Add the [L] flag to EVERY rule, otherwise you can trigger a nasty Apache bug.
| 1:27 am on Apr 3, 2011 (gmt 0)|
It's curious about the .
I'll leave it is is because it's been quite a headache to reach this point and there will not be any urls with indexaphp in (or such).
I'll change the other two straight away.
| 1:34 am on Apr 3, 2011 (gmt 0)|
Changed to use ([^/]+/)+ and [R,L]
and tried some urls straight from WMT and it's fine :)
Thanks g1smd !
| 7:27 am on Apr 3, 2011 (gmt 0)|
[R,L] gives a 302 redirect.
The . vs. \. is important. Ignore it at your peril. Don't leave loopholes that can be exploited.