Forum Moderators: phranque
::Code Snippet::
#Redirect old page requests
Options +FollowSymLinks
RewriteEngine on
RewriteRule posting-id-(.*)-keywords-(.*)\.nai$ posting.nai?id=$1&keywords=$2
::Result::
Bad Request
Your browser sent a request that this server could not understand.
I do have mod_rewrite and mod_alias on hmmmm. I'll continue to read up some manuals to trouble shoot, but anyones input is highly appreciated. It might be some prior codes or something, I can't even rewrite a url request that comes in.
::example::
RewriteRule ^(iqc)(.*) [domain.com...] [NC]
Dosen't work....
Be careful of the distinction between an internal rewrite and an external client redirect. The former is a URL-to-filepath translation occurring inside the server, while the latter is a URL-to-URL translation which requires the client to cooperate in handling a redirection response from your server. The mod_rewrite syntax for these two functions is quite different.
Test with a very simple, correctly-formed rule, such as
RewriteRule ^foo\.html$ http://www.google.com/ [R=301,L]
The next step is to redirect or rewrite to a page on your own site, but make it a static HTML page to eliminate potential scripting problems. Finally, test again using your script as the target.
If you find that your script is producing invalid redirect responses, the "Live HTTP Headers" add-on for Firefox/Mozilla browsers may prove useful in examining the HTTP transactions between your test client and the server.
Once those test rules are working, I recommend that you address the terribly-inefficient regular-expressions patterns in your original rule. Your server may run noticeably faster if you use the [L] flag and much-more specific subpatterns such as
RewriteRule ^posting-id-([^-]+)-keywords-([^.]+)\.nai$ /posting.nai?id=$1&keywords=$2 [L]
The several errors and inefficiencies in your code snippets indicate that you may benefit from studying the resources cited in our Forum Charter. As illustrated by my comments above, very tiny changes in mod_rewrite code can have huge effects on the performance and reliability of your server. The smallest of errors can take down your server immediately (if you're lucky) or sit quietly for years, slowly destroying your search engine rankings. So, 'guessing and experimentation' is not advisable. It is best to proceed from a solid base of knowledge.
Jim
[edited by: jdMorgan at 8:52 pm (utc) on Oct. 23, 2009]
I did get some guidance from this thread that you guys created (thank you by the way).
[webmasterworld.com...]
I think it may be a module not on?
I tested this example:
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^foo\.html$ [google.com...] [R=301,L]
If in a config file, is it in inside a <Directory> section or not? If so, what directory-path is declared?
If in a .htaccess file, what directory is it in, relative to the root ('home page' directory) of your web site?
Jim
In the example you gave Jim,
RewriteRule ^posting-id-([^-]+)-keywords-([^.]+)\.nai$ /posting.nai?id=$1&keywords=$2 [L]
I want to try to understand each part, so I can cut down on my questions here and not ruin my server like Jim said.
the ^ ::is the start of the string::
the () ::dont know, I think Jim said it was whitespace acceptance::
the [] ::range::
THIS THOUGH ^- ::what range is this defining?::
Thank you anyone, there's nothing wrong with the code of course, just trying to understand it.
Square brackets define an alternate-character group, any member of which should be accepted as a match.
Escaping rules and token meanings change within groups.
Groups may contain lists of characters, or alphabetic or numeric string ranges.
Any character within the group is considered an acceptable match.
Everything in a group is considered to be a character, not a string. i.e. [abc] matches "a" or "b" or "c" not "abc". Example: "[bcdfghj-np-tv-z]" means, "Match any consonant." (You could also write it as "[b-df-hj-np-tv-z]" which is somewhat less efficient.)
Grouped alternates may be quantified, that is, a regex quantifier such as "?" or "*" or "+" or "{1,3}" may follow the group, and will define how many members of that group are required to be present at this position in the string being matched.
Take a look at the regular expressions tutorial cited in our Forum Charter -- You should have picked up on most of this stuff from any decent regex guide.
---
"[^-]+" means "Match one or more characters not a hyphen," or equivalently, "Match until you find the next hyphen." As the first character within a group "^" means "NOT". The parentheses around that subpattern mean that we want to store the matched part of the input string for later use as a back-reference, e.g. as "$1" in your substitution path.
This negative-match pattern is much more efficient than using ".*" which means, "Match zero or more of any character(s), and match as many characters as possible." If that pattern is used, it will 'consume' the entire input string -- all the way to the end, and if additional subpatterns follow that ".*" then they will initially be 'starved' and the match will fail. The matching engine will then have to 'back off' from the end of the input string one character at a time, trying again and again to find a match for the whole pattern.
If you use multiple ".*" subpatterns you get into recursion, and potentially force dozens, hundreds, thousands, or even tens of thousands of 'back off and re-try' matching attempts, depending on how many 'promiscuous and greedy' ".*" subpatterns are present in the pattern and on the length of the input string to be matched.
By contrast, using negative-match patterns allows the string to be matched in a single left-to-right pass... :)
You will note that in the Apache mod_rewrite documentation and in the accompanying Apache URL Rewriting Guide, there is an almost utter lack of use of the ".*" pattern... and this is a very strong hint that it should be very rarely used. However, because most people don't/won't "read the book," and because the match-anything ".*" is an easy-to-remember one-size-fits-all subpattern, you'll find truly awful (grossly inefficient) code posted all over the Web.
Until recently, most of the 'stock mod_rewrite code' that came with WP and Joomla and many control panels met this description -- we're working on that problem by example here, and I've seen signs that some of them may be reading... :)
How important is this? Sometimes not very, but possibly critical; I've seen sites which were about to be forced into major server upgrades 'saved' by simple corrections to their (previously-awful) regular-expressions patterns; The sites went from being agonizingly-slow and unusable to being speedy and responsive simply by making the change discussed here (albeit in several rules, not just one). In short, writing the most efficient regex patterns possible is just a very good habit to develop and adhere to. This applies whether you're using regex in mod_rewrite, other Apache modules, CGI-SSI, PERL, PHP, 'C', JavaScript, or any other configuration, programming, or scripting 'languages' that make use of regular-expressions pattern-matching.
Jim
Definitely will avoid: .* in my patterns. Well instead playing Halo tonight looks like I'm going go through my scriptings ([lol]+)
I do feel these are more efficient in their places (PLEASE ANYONE CORRECT ME IF I'M WRONG).
[0-9]+ ::one or more characters that are a number::
[A-Z]+ ::one or more characters that are Cap letters::
[a-z]+ ::one or more characters that are low letters::
[^-]+ ::one or more characters NOT ^ the character - then when found, quit looking (is that right)
Then to make it not more than one character remove the + sign
I'll be putting up my new regex rewrite rule for positive criticism. I can say Jim, you made a new wrinkle in my brain =)
To store it in a string, the match, surround it in parentheses ()
I think the best thing I like about the negative match pattern is it is one-way and will not read over it's self. So as you said, I'm guessing that results in increased performance and given this is my server config. If it is poorly written , I can expect poor pages every load (because the config is read with every request).
The above works pass the header (301 code) and the variables after .php
but I'm trying to match only exact cases of
RewriteRule ^buy\.php\?listingid=$ /posting_$1 [R=301]
to result in /posting_######
isn't the \ suppose to escape special characters like ?