Forum Moderators: phranque
This is my second query in a couple of days, but I'll try and answer a few posts here on WW once I've got these things covered!
The situation is this: we have a glossary, in subdirectory /glossary. It _used_ to have a single page for every glossary entry but this was abandoned for a database glossary delivered dynamically. This was about 8 years ago, and never been an issue, but strangely Google has begun to turn up incoming links from sites using the old link style leading to a lot of 404s.
These are the files in the glossary directory:
/glossary/index.php
/glossary/ByCat.php
/glossary/WordFind.php
What I figured was, if there were any page requests in the folder that were for files that weren't one of these three, then I would redirect to the WordFind.php page with the file request as a query parameter. So, I devised this cunning .htaccess file and placed it in the /glossary directory:
#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(index\.php(.*)?){0}\ HTTP/
#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(ByCat\.php(.*)?){0}\ HTTP/
#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(WordFind\.php(.*)?){0}\ HTTP/
#RewriteRule (.*) http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]
But this NEVER seems to hit - not even throwing an error. I'm looking, but I don't get it! Again, I'd be very grateful for any help offered, thanks.
Neil.
The logic problem is that you are requiring the request to be all of those different URLs at the same time -- clearly an impossibility.
The whole thing can be boiled down to
RewriteCond %{REQUEST_URI} ^/glossary/
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule !^glossary/(index¦ByCat¦WordFind)\.php$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]
We need good examples of *all* URL- and pattern-related details (i.e input and output URLs+querystrings) to be productive here... please. :)
Replace the broken pipe "¦" characters you see in patterns with solid pipe characters before use; Posting on this forum modifies the pipe characters.
Jim
[edited by: jdMorgan at 2:17 pm (utc) on April 17, 2009]
> Your logic is inverted
LOL! Yes, erm, probably something like that... ;) The "LOGIC" (ha!) that I was trying to express is this:
If this address is NOT asked for AND this address is NOT asked for AND this address is NOT asked for THEN...
At the heart was this bit you pointed out:
>> /glossary/(index\.php(.*)?){0}
>I wasn't able to figure out what you meant by "{0}", either.
Well, I was... shall we say _hoping_ it would translate as the preceding brace occurred 0 times - so, didn't occur. Ok, I was pretty sure I was wrong... :P
> We need good examples of *all* URL- and pattern-related details (i.e input and output URLs+querystrings) to be productive here
Sorry! Ok, with my early attempts, all URL's worked as if the .htaccess wasn't there. Now, unfortunately, I've tried both the solutions presented... and both are resulting in an infinite loop. :(
In the case of the following URL:
http://www.example.com/glossary/WordFind.php?wordInput=FOO
Nothing should happen. However, it's returning:
http://www.example.com/glossary/WordFind.php?wordInput=wordInput=wordInput=wordInput=wordInput= (etc)
until a redirect error is thrown.
In the case of the following URL:
http://www.example.com/glossary/FOO
It should redirect to:
http://www.example.com/glossary/WordFind.php?wordInput=FOO
However, it has the same effect, and returns the same infinite loop. I'm still looking at it...!
Using this code:
RewriteCond %{REQUEST_URI} ^/glossary/
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule !^glossary/(index¦ByCat¦WordFind)\.php$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]
This hits, but I get an infinite loop.
Using this code:
RewriteCond $1 !^(index¦ByCat¦WordFind)\.php$
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.+)$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]
This one simply never hits - the conditions are not met whether the page is valid OR invalid. Obviously I'm stumped, any thoughts? Thanks again.
Jim: I discovered that an .htaccess file in the /glossary folder was preventing the root htaccess file from working properly. I've now shifted all the code to the root htaccess file.
So, this is the code I'm using, and it's working....
RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.+)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]
As I said, this is doing the job -- but it's not perfect! I hope you can see that I'm trying to filter out .html at the end of the URI in this bit ---
^glossary/(.+)(\.html)?$
Here I *thought* that if I bracketed (\.html) and used the ? operator that only matches before (\.html) would be included in $1.
So, this:
www.example.com/glossary/term.html
would redirect to:
www.example.com/glossary/WordFind.php?wordInput=term
But it doesn't drop the .html, so the query ends with ?wordInput=term.html.
I've discovered that removing the brackets from the \.html? like so:
RewriteRule ^glossary/(.+)\.html?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]
WILL remove the .html, great! Except... anything that does NOT end with .html then results in a 404!
Sure I could easily code this out in PHP, but I feel sure that it could be done more neatly here. Is this possible? If so, what am I doing wrong?! Thanks!
RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.*?)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]
Turns out that using a non-greedy match of .* (but not .+ though I don't understand why not?) does the trick. Cheers for the help - and hopefully this helps someone else out too! :)
I warn people all the time to avoid ".*" for several reasons: First, its greedy bahaviour leads to unexpected problems like this, and second, because it initially matches all the way to the end of the requested URL-path, and then has to "back off" one character at a time to find a match. If multiple ".*" subpatterns are used, this can result in thousands of retries because the number of "trial matches" varies as the square of the number of ".*" subpatterns times the length of URL-path-part "tail" being matched, and can cause a server performance nightmare on very-busy sites.
A better approach is to use the concept of "Match until you find a character you *don't* want." As in:
RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/([^.]+)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]
You can change the quantifier in "[^.]+" to "*" if you really want to accept/redirect blank wordInput values; Because the pattern is much more specific, this is not problematic, as it is when any number of any characters are matched using ".*"
Jim