Forum Moderators: phranque

Message Too Old, No Replies

Replace String In .htaccess URL

         

bgordon

4:54 pm on Jan 2, 2010 (gmt 0)

10+ Year Member



I have created some rewrite rules to handle some simplified brand searches inbound. The problem is that my script expects urlencoded strings for brands...

so...

www.mysite.com/brand/coke.html -> search.php?brand=coke

I have the first example working fine... it was simple...
RewriteRule ^/?(brand)/([-a-zA-Z0-9-]+)\.html$ search.php?brand=$2 [L]

My second requirement is to deal with spaces in the brand names so I have a convention where the inbound url will have spaces replaced with underscores...

www.mysite.com/brand/pepsi_cola.html -> search.php?brand=pepsi+cola

but how do I do a string replacement regex to make the second option work?

jdMorgan

7:45 pm on Jan 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your best bet here is to modify your script to accept the underscores, as character-conversion in .htaccess is inefficient -- especially in cases where multiple substitutions must be made. A simple preg_replace directive just before the database lookup(s) is usually sufficient.

Otherwise, you'll either need to make the rewriterule recursive (highly inefficient, and possibly a server-killer on a busy site) or you'll need enough versions of the rule to handle the maximum possible number of underscore-replaced spaces.

Note that you can't just use multiple 'stacked' rules, because there's a bug in mod_rewrite that often makes this approach malfunction.

So, something like:


RewriteRule ^([^_])_([^_]+)_([^_]+)_(.+)$ /$1\%20$2\%20$3\%20$4 [NE,L]
RewriteRule ^([^_]+)_([^_]+)_(.+)$ /$1\%20$2\%20$3 [NE,L]
RewriteRule ^([^_]+)_(.+)$ /$1\%20$2 [NE,L]

The above will handle three, two, or one underscore, but will fail if four underscores are present. Add more rules at the top (following the pattern shown) if you need to handle more. The regex patterns shown here are the most efficient possible, but as you can see, all three rules will be processed for every request to your server, and adding more rules for more possible underscores makes it worse. So this approach is inefficient even at best.

See the [NE] (no escape) flag in the mod_rewrite RewriteRule documentation for more info on my use of this flag and the "\" shown before the percent signs in the substitution.

Do not omit the leading slash on the substitution, as doing so would open up your server to a known security exploit by allowing the initial path-part to be controlled by any malicious user-agent.

There is a limit in mod_rewrite to the number of back-references you can use: You will be able to replace only nine underscores before having to resort to a much more complicates (and even more inefficient) solution.

You might also consider using a 'skip rule' ahead of these rules if you add additional rules to what's shown. Assuming that you add one more rule (for four underscores), you'd code that as


RewriteRule !_ - [S=4]

to skip over the next four rules if no underscores are present in the requested URL-path. The exact number of rules needed to make the skip rule 'worthwhile' varies with the length of your URLs and the ration of requests to your server that do and do not contain underscores, so I'll just guess at four. If you've got a busy site, you should test this aspect of the implementation and use the method that minimizes server load.

As stated above, I strongly suggest doing this character-substitution function in your script instead, so that the script's "input modification" when paths containing underscores are requested mirrors the "output modification" you're currently doing when the database URL contains spaces.

Jim