Forum Moderators: phranque

Message Too Old, No Replies

modrewrite a-b.html to a b question

         

mikeseo

1:34 am on Feb 15, 2005 (gmt 0)

10+ Year Member



How can I make modrewrite work so that when I request the file www.site.com/a-b.html it will do www.site.com/cgi-bin/search/search.cgi?keywords=a b or a%20b

Right now it would go to www.site.com/cgi-bin/search/search.cgi?keywords=a-b

My htaccess file:
RewriteRule ^(.*)\.html$ /cgi-bin/search/search.cgi/?keywords=$1

jdMorgan

2:23 am on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'll need to use the NoEscape flag [NE]:

RewriteRule ^([^-]+)-(.+)$ /cgi-bin/search/search.cgi/?keywords=$1\%20$2 [NE,L]
RewriteRule ^(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1 [L]

A better solution might be to do this conversion inside the search.cgi script.

Jim

mikeseo

3:46 am on Feb 15, 2005 (gmt 0)

10+ Year Member



Awesome, thanks! Would there be any disadvantage to using modrewrite rather than modifying search.cgi? For this code to handle up to 5 dashes does this look right? Any problems? It seems to work fine.

RewriteRule ^([^-]+)-(.+)-(.+)-(.+)-(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1\%20$2\%20$3\%20$4\%20$5 [NE,L]
RewriteRule ^([^-]+)-(.+)-(.+)-(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1\%20$2\%20$3\%20$4 [NE,L]
RewriteRule ^([^-]+)-(.+)-(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1\%20$2\%20$3 [NE,L]
RewriteRule ^([^-]+)-(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1\%20$2 [NE,L]
RewriteRule ^(.+)\.html$ /cgi-bin/search/search.cgi/?keywords=$1 [L]

Also what does [L] and [NE] do?

jdMorgan

5:19 am on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It will be much faster if you use a negative lookahead match, "[^-]+" as I did in my example. This basically says, "match one or more characters not equal to a hyphen," and so does not require mod_rewrite to iterate trying to find the best match for the whole string. Rather, it just has to consume characters until it finds a hyphen, and then go on to the next "piece" of the pattern.

Best advice is, dump the ".+" and ".*" patterns and don't use them unless there is no other alternative. They are "easy" patterns to use, but they are the two most ambiguous patterns, and can cause horrible inefficiencies.

For example, when encountering a pattern like "^(.*)-(.*)$" as applied to one of your local-URL paths, mod_rewrite's regex processor will match the entire url-path with the first "(.*)", i.e. $1. However, it will then discover that it needs at least one following hyphen. But the last character in the URL is not a hyphen, so it has to "back up" from the end-anchor into the last part of the URL-path until it finds a hyphen. It then puts that last part of the URL into $2, and the part preceding that hyphen into $1, and the pattern match is now satisified.

But imagine that there are two hyphens in the URL-path, and the pattern is "^(.*)-(.*)-(.*)$". Now, work through what the regex processor would have to do to match that. It will give you a headache. And it will give your server's CPU a heat stroke.

The negative lookahead matching neatly prevent all this iterative work, and allows the pattern to be matched simply from left to right. So the pattern would be "^([^-]+)-([^-]+)-(.+)$", or, "Start with one or more characters not equal to a hyphen (stop on first hyphen and save everything up to there as $1), followed by a hyphen, followed by one or more characters not equal to a hyphen (save as $2), followed by a hyphen, followed by one or more characters to end (save as $3).

[L] = Last. Quit processing RewriteRules if this one matches. You should almost always use [L] -- unless you have a good reason not to.

[NE] = No Escape. Do not escape literal characters such as \% and \$ in the substitution path. See the mod_rewrite documentation of RewriteRule for details. Links to this and other references are in our forum charter -- see link at upper left of this page.

Jim