Forum Moderators: phranque

Message Too Old, No Replies

Using a Redirect if subdirectory files are NOT matched

Only redirect if none of the pages in the directory are requested

         

nowpc

1:19 pm on Apr 17, 2009 (gmt 0)

10+ Year Member



Hi all,

This is my second query in a couple of days, but I'll try and answer a few posts here on WW once I've got these things covered!

The situation is this: we have a glossary, in subdirectory /glossary. It _used_ to have a single page for every glossary entry but this was abandoned for a database glossary delivered dynamically. This was about 8 years ago, and never been an issue, but strangely Google has begun to turn up incoming links from sites using the old link style leading to a lot of 404s.

These are the files in the glossary directory:

/glossary/index.php
/glossary/ByCat.php
/glossary/WordFind.php

What I figured was, if there were any page requests in the folder that were for files that weren't one of these three, then I would redirect to the WordFind.php page with the file request as a query parameter. So, I devised this cunning .htaccess file and placed it in the /glossary directory:

#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(index\.php(.*)?){0}\ HTTP/
#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(ByCat\.php(.*)?){0}\ HTTP/
#RewriteCond %{THE_REQUEST} ^[A-Z]+\ /glossary/(WordFind\.php(.*)?){0}\ HTTP/
#RewriteRule (.*) http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]

But this NEVER seems to hit - not even throwing an error. I'm looking, but I don't get it! Again, I'd be very grateful for any help offered, thanks.

Neil.

jdMorgan

1:44 pm on Apr 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your logic is inverted, and you don't need to use THE_REQUEST here. You can use either REQUEST_URI, or even just $1 instead.

The logic problem is that you are requiring the request to be all of those different URLs at the same time -- clearly an impossibility.

The whole thing can be boiled down to


RewriteCond %{REQUEST_URI} ^/glossary/
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule !^glossary/(index¦ByCat¦WordFind)\.php$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]

It was not possible for me to tell what your old URL query strings looked like, so this likely needs some work. I wasn't able to figure out what you meant by "{0}", either.

We need good examples of *all* URL- and pattern-related details (i.e input and output URLs+querystrings) to be productive here... please. :)

Replace the broken pipe "¦" characters you see in patterns with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

[edited by: jdMorgan at 2:17 pm (utc) on April 17, 2009]

jdMorgan

1:49 pm on Apr 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Alternate form (maybe more easily-readable)

RewriteCond $1 !^(index¦ByCat¦WordFind)\.php$
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.+)$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]

Jim

nowpc

2:57 pm on Apr 17, 2009 (gmt 0)

10+ Year Member



Hi Jim.

> Your logic is inverted
LOL! Yes, erm, probably something like that... ;) The "LOGIC" (ha!) that I was trying to express is this:

If this address is NOT asked for AND this address is NOT asked for AND this address is NOT asked for THEN...

At the heart was this bit you pointed out:

>> /glossary/(index\.php(.*)?){0}
>I wasn't able to figure out what you meant by "{0}", either.

Well, I was... shall we say _hoping_ it would translate as the preceding brace occurred 0 times - so, didn't occur. Ok, I was pretty sure I was wrong... :P

> We need good examples of *all* URL- and pattern-related details (i.e input and output URLs+querystrings) to be productive here

Sorry! Ok, with my early attempts, all URL's worked as if the .htaccess wasn't there. Now, unfortunately, I've tried both the solutions presented... and both are resulting in an infinite loop. :(

In the case of the following URL:

http://www.example.com/glossary/WordFind.php?wordInput=FOO

Nothing should happen. However, it's returning:

http://www.example.com/glossary/WordFind.php?wordInput=wordInput=wordInput=wordInput=wordInput= (etc)

until a redirect error is thrown.

In the case of the following URL:

http://www.example.com/glossary/FOO

It should redirect to:

http://www.example.com/glossary/WordFind.php?wordInput=FOO

However, it has the same effect, and returns the same infinite loop. I'm still looking at it...!

jdMorgan

4:37 pm on Apr 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... and what is the code you're using at this moment?

Jim

jdMorgan

2:24 am on Apr 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only thing I can see that could cause a problem with the code I previously posted is if you didn't heed the note above:

Replace the broken pipe "¦" characters you see in patterns with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

nowpc

7:21 pm on Apr 23, 2009 (gmt 0)

10+ Year Member



Hi again. Sorry to say that I've still not managed to crack this. :( (And yeah, I remembered to change the pipes!)

Using this code:

RewriteCond %{REQUEST_URI} ^/glossary/
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule !^glossary/(index¦ByCat¦WordFind)\.php$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]

This hits, but I get an infinite loop.

Using this code:

RewriteCond $1 !^(index¦ByCat¦WordFind)\.php$
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.+)$ http://www.example.com/glossary/WordFind.php?wordInput=%1 [R=301,L]

This one simply never hits - the conditions are not met whether the page is valid OR invalid. Obviously I'm stumped, any thoughts? Thanks again.

nowpc

12:51 pm on Apr 24, 2009 (gmt 0)

10+ Year Member



Hi again.

Jim: I discovered that an .htaccess file in the /glossary folder was preventing the root htaccess file from working properly. I've now shifted all the code to the root htaccess file.

So, this is the code I'm using, and it's working....

RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.+)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]

As I said, this is doing the job -- but it's not perfect! I hope you can see that I'm trying to filter out .html at the end of the URI in this bit ---

^glossary/(.+)(\.html)?$

Here I *thought* that if I bracketed (\.html) and used the ? operator that only matches before (\.html) would be included in $1.

So, this:

www.example.com/glossary/term.html

would redirect to:

www.example.com/glossary/WordFind.php?wordInput=term

But it doesn't drop the .html, so the query ends with ?wordInput=term.html.

I've discovered that removing the brackets from the \.html? like so:

RewriteRule ^glossary/(.+)\.html?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]

WILL remove the .html, great! Except... anything that does NOT end with .html then results in a 404!

Sure I could easily code this out in PHP, but I feel sure that it could be done more neatly here. Is this possible? If so, what am I doing wrong?! Thanks!

nowpc

1:00 pm on Apr 24, 2009 (gmt 0)

10+ Year Member



Solved it!

RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/(.*?)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]

Turns out that using a non-greedy match of .* (but not .+ though I don't understand why not?) does the trick. Cheers for the help - and hopefully this helps someone else out too! :)

jdMorgan

3:02 pm on Apr 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "(.*)" subpattern, being greedy and promiscuous, "consumes" the trailing ".html" if you've made that "(.html)" optional by following it with "?". So as a result, ".html" will end up in $1 and $2 will always be empty.

I warn people all the time to avoid ".*" for several reasons: First, its greedy bahaviour leads to unexpected problems like this, and second, because it initially matches all the way to the end of the requested URL-path, and then has to "back off" one character at a time to find a match. If multiple ".*" subpatterns are used, this can result in thousands of retries because the number of "trial matches" varies as the square of the number of ".*" subpatterns times the length of URL-path-part "tail" being matched, and can cause a server performance nightmare on very-busy sites.

A better approach is to use the concept of "Match until you find a character you *don't* want." As in:


RewriteCond $1 !^(index¦ByCat¦WordFind)\.php
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^glossary/([^.]+)(\.html)?$ http://www.example.com/glossary/WordFind.php?wordInput=$1 [R=301,L]

Here, we simply match characters into $1 as long as we don't find a period. If there is no period, then we match all the way to the end of the requested URL-path.

You can change the quantifier in "[^.]+" to "*" if you really want to accept/redirect blank wordInput values; Because the pattern is much more specific, this is not problematic, as it is when any number of any characters are matched using ".*"

Jim