Forum Moderators: phranque

Message Too Old, No Replies

Odd 404 Error when redirecting and rewriting

They work alone, but not together

         

StaceyJ

9:21 pm on Jan 2, 2011 (gmt 0)

10+ Year Member



I have the below in the root htaccess file. Alone the redirect works fine, and alone the rewrite works fine.

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^\./]+\.cgi(.*)\ HTTP/
RewriteRule ^cgi-bin/store/([^\./]+)\.cgi(.*)$ http://www.example.com/$1$2 [R,L]

RewriteRule ^([^\./]+)(/[^\./]+)?(/[^/]+)?$ /cgi-bin/store/$1.cgi$2$3 [L]

However when you put the two together something odd is taking place. It generates a 404 and the server error log shows:

script not found or unable to stat: /usr/www/users/######/#####/cgi-bin/store/cgi-bin.cgi

It's like the first backreference in the rewrite is always picking up "cgi-bin" from the original request instead of what the redirect is redirecting to and showing in the address bar. I've stared at this for days and tried all sorts of tweaks, none of which help EXCEPT adding

RewriteCond %{REQUEST_URI} !cgi-bin

to the last rule, then it works.

But my question is why is it behaving like that? Can anyone explain? I've read things over and over here and everything seems to be right, but something isn't. Thank you!

g1smd

2:30 am on Jan 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How many actions are shown in the Live HTTP Headers (an extension for Firefox) report?

Are there multiple redirects in a chain, or just one redirect?

You likely also want [R=301,L] rather than the 302 redirect you have now.

StaceyJ

3:00 pm on Jan 3, 2011 (gmt 0)

10+ Year Member



Thank you.

How many actions are shown in the Live HTTP Headers (an extension for Firefox) report?

Are there multiple redirects in a chain, or just one redirect?

There is only one 302 redirect (to the correct new URL per Live Http Headers) and then the 404 error when it tries to get that page.

You likely also want [R=301,L] rather than the 302 redirect you have now.

Thanks, I understand that and will add that when I finally get it all working. I had read (which doesn't necessarily make it true) that a 301 can sometimes cause issues when testing even when you clear the browser cache, so I left it off for now for that reason.

jdMorgan

7:04 pm on Jan 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What test URL causes the problem?
Is the value of the obscured /###/###/###/ filepath correct for your server?

Jim

StaceyJ

7:32 pm on Jan 5, 2011 (gmt 0)

10+ Year Member



I was going to update this with the URLs in question, thanks for asking. There are 3 parts shown below, the first part 404s, the second two resolve. I'm not even quite sure how that is possible.

www.example.com/dir1

www.example.com/dir1/dir2

www.example.com/dir1/dir2/prodid.123456

As for the filepath, yes, it is correct for the server, the two sections I X'ed out are userid and the folder for this domain under the main account.

Added for clarification:

where /dir1 is the name of the cgi script minus the .cgi file extension.

jdMorgan

11:35 pm on Jan 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, there were some regex problems and a possible infinite-loop in there...

Try:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^/.]+\.cgi/[^\ ]*\ HTTP/
RewriteRule ^cgi-bin/store/([^/.]+)\.cgi(/.*)?$ http://www.example.com/$1$2 [R=301,L]
#
RewriteCond $1 !^cgi-bin$
RewriteCond $2 !^/store/[^/.]+\.cgi
RewriteRule ^([^/.]+)((/[^/.]+)*)$ /cgi-bin/store/$1.cgi$2 [L]

I also removed unnecessary parentheses, removed unneeded escaping within the alternate-character-groups, and ordered the negative-match alternate-character groups with the most-likely-to-occur-characters first (for speed).

The exclusions on the second rule are necessary to prevent it looping on itself. I could be written more efficiently as

RewriteCond $1$2 !^cgi-bin/store/[^/.]+\.cgi
RewriteRule ^([^/.]+)((/[^/.]+)*)$ /cgi-bin/store/$1.cgi$2 [L]

or as

RewriteCond $1 !^cgi-bin/store/[^/.]+\.cgi
RewriteRule ^(([^/.]+)((/[^/.]+)*))$ /cgi-bin/store/$2.cgi$3 [L]

once you understand what the simpler-syntax version above is doing. They're all equivalent, with the second two versions only a bit more efficient. Use whichever you like.

Jim

StaceyJ

5:49 pm on Jan 6, 2011 (gmt 0)

10+ Year Member



Well, there were some regex problems and a possible infinite-loop in there...

Thanks for the fixes, however I still don't understand where the infinite-loop is coming from, but I'm happy to see I was at least on the right track to fixing it with my !cgi-bin condition (I realize I was missing the anchors), which was really just done out of frustration and it actually made me laugh when it worked. If you have the time to explain I'd greatly appreciate it, but I know you're busy.

Try:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^/.]+\.cgi/[^\ ]*\ HTTP/
RewriteRule ^cgi-bin/store/([^/.]+)\.cgi(/.*)?$ http://www.example.com/$1$2 [R=301,L]
#
RewriteCond $1 !^cgi-bin$
RewriteCond $2 !^/store/[^/.]+\.cgi
RewriteRule ^([^/.]+)((/[^/.]+)*)$ /cgi-bin/store/$1.cgi$2 [L]

The second and third conditions got me to scratching my head re the backreferences, but then I remembered reading in the docs where the rule is processed before the conditions, so then it made sense. I'm guessing that is more efficient than using REQUEST_URI?

I also had to remove the second . in the second rule because when it 404'd I remembered there will always be a . in the old URLs at that point (unfortunately).

I have a question regarding the first condition, near the end you have .cgi/[^\ ]* and even though it worked, I'm not sure how, because there is a time there will not be a / after .cgi if you look at the 3 sample URL's I posted above. It seems it should be .cgi(/[^\ ]+)* making the / along with anything following it optional?

I also removed unnecessary parentheses, removed unneeded escaping within the alternate-character-groups, and ordered the negative-match alternate-character groups with the most-likely-to-occur-characters first (for speed).

Thank you! I know my rewrite rule was a little longer than it needed to be and you gave me a suggestion in another post how to consolidate it also.

The exclusions on the second rule are necessary to prevent it looping on itself. I could be written more efficiently as

RewriteCond $1$2 !^cgi-bin/store/[^/.]+\.cgi
RewriteRule ^([^/.]+)((/[^/.]+)*)$ /cgi-bin/store/$1.cgi$2 [L]

or as

RewriteCond $1 !^cgi-bin/store/[^/.]+\.cgi
RewriteRule ^(([^/.]+)((/[^/.]+)*))$ /cgi-bin/store/$2.cgi$3 [L]

once you understand what the simpler-syntax version above is doing. They're all equivalent, with the second two versions only a bit more efficient. Use whichever you like.

Thanks, I appreciate all 3 versions, I understand what they do and how the longer one was shortened. I also appreciate the longer one because sometimes it's easier for me to see the big picture that way before consolidating them.

I have one other question for now, when I was testing with my own rules, sometimes I would get some broken images, even though my image URLs all start with /. I'm not sure why they were broken, so I added
RewriteCond %{REQUEST_URI} !^\.(php|html?|css|jpe?g|gif|js|txt|ico)$
to the rewrite rule and I don't get anymore broken images. I understand what the condition does, my question is, is this needed, or is it overkill? And should it be $1 instead of %{REQUEST_URI} ?

Thanks so much for all the help and suggestions! I've learned tons over the last few weeks.

jdMorgan

11:28 pm on Jan 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> question regarding the first condition, near the end you have .cgi/[^\ ]*

The "[^\ ]*" subpattern says, "Match zero or more characters not equal to a space." Therefore, including the "/" ahead of that subpattern is neither necessary nor desirable. We're basically allowing *anything* to follow ".cgi" before we hit the "HTTP/1.0" or "HTTP/1.1" at the end of %{THE_REQUEST} -- which is the exact same string that you see in quotes in your raw server access log, typically starting with "GET".

Jim

StaceyJ

12:07 am on Jan 7, 2011 (gmt 0)

10+ Year Member



> question regarding the first condition, near the end you have .cgi/[^\ ]*

The "[^\ ]*" subpattern says, "Match zero or more characters not equal to a space." Therefore, including the "/" ahead of that subpattern is neither necessary nor desirable.

Thank you, I understand what the pattern means though. I think you may have missed my question, YOU included the / before [^\ ]* - I did not. I was questioning why you included it as it wasn't made optional in your code "/[^\ ]*". Therefore wouldn't it have to match ".cgi/" and then "zero or more characters not equal to a space"? So if the URL ended with ".cgi" with NO trailing slash, wouldn't it NOT match the entire condition and not run the redirect? I know what I'm asking, sorry if I'm not making it clear. Thank you, again!

So shouldn't it just be ".cgi[^\ ]*" in that case, with no / after .cgi?

jdMorgan

3:45 pm on Jan 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you think that somehow I never make errors, thanks. But I assure you that that is not the case... :)
The inconsistency between the RewriteCond and RewriteRule patterns was a dead give-away that there was an error there. But I make plenty of such minor logic or typographical errors every day. That's why I/we must test. :)

When writing code to "correct" URL errors, it is best to make the patterns and conditions as forgiving as possible, so as to correct the widest possible range of errors -- to include, in this case, direct client requests for the old .cgi URLs (which are now only used as server-internal filepaths), and any typographical-error variations of those old .cgi URLs. So in this case, getting rid of that trailing slash requirement in both patterns is likely the correct approach.

Jim

StaceyJ

5:59 pm on Jan 7, 2011 (gmt 0)

10+ Year Member



If you think that somehow I never make errors, thanks. But I assure you that that is not the case... :)

Lol! I think I saw you make a typo...once. :)

That's why I/we must test. :)

What's funny is I questioned it before I tested it, but I decided to test it as is anyway, and it still worked, which is really odd. But then again there seems to be a lot of odd things going on here.

So in this case, getting rid of that trailing slash requirement in both patterns is likely the correct approach.

So it would be...

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^/.]+\.cgi[^\ ]*\ HTTP/
RewriteRule ^cgi-bin/store/([^/.]+)\.cgi(.*)$ http://www.example.com/$1$2 [R=301,L]

The reason I only questioned the condition and not the rule is you had the / optional in the rule ".cgi(/.*)?" which would be correct given the URLs, and knowing how you are about efficiency I thought maybe including the slash like that might have something to do with it, thinking to myself if it didn't find a slash after .cgi it wouldn't bother trying to match everything else, thus maybe saving a little time. And please understand, I wasn't questioning you, I was questioning whether I wasn't understanding something. It can get confusing for sure. But you have taught me so much through this and by reading tons of your posts and examples, thank you so much for your time and patience!

jdMorgan

1:49 pm on Jan 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I wasn't questioning you

Sometimes, you should... :)

When doing this kind of "detect direct client request" function --most often used for detecting requests for old URLs which are now used only as internal filepaths as part of a two-rule set (this Old-URL-to-new-URL external redirect and a new-URL-to-filepath internal rewrite)-- the pattern for the URL-path in the RewriteCond %{THE_REQUEST} and the rewriterule should always be consistent. Otherwise, you would likely get some *really* weird and inconsistent behavior, and the error which might be almost impossible to find even if you were to detect that it was happening -- it would likely occur as a rare and apparently-intermittent error.

Jim

StaceyJ

5:34 pm on Jan 9, 2011 (gmt 0)

10+ Year Member



the pattern for the URL-path in the RewriteCond %{THE_REQUEST} and the rewriterule should always be consistent.

Thanks. So which would be better or more efficient in this case [^\ ]* or (.*) ? (the ? is an actual question mark, not part of the regex :) To me it seems like [^\ ]* would be best to use in the condition since we're wanting to match all characters (except a space) until we come to the space before HTTP. But in the rule it seems like (.*) might be better because we're just matching the remainder of the URL? It seems the greediness shouldn't be a bad thing here, we actually want it to be greedy, don't we? Is my thinking correct? And what actually comes at the end of a URL? A newline? It's certainly not a space. But then again I never really thought about it before. But then there goes the consistency thing you brought up. Everytime I think I've almost got it (I mean just this particular condition and rule) I think up something else I don't know. Are you ready to go ARRRRGGGGHHHH at me yet? :)

jdMorgan

6:14 pm on Jan 9, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code should appear exactly as you posted... Note that I said the patterns should be consistent, not identical.

They must not be identical, because they're looking at different strings. If RewriteRule sees "xyz.abc" (as the URL-path) then THE_REQUEST may very likely contain "GET /xyz.abc HTTP/1.1", or if a query string and URL-fragment/anchor are present, "GET /xyz.abc?foo=bar#anchor HTTP/1.1". So the RewriteCond and RewriteRule patterns can't be the same, but the subpattern(s) matching the URL-path-parts should be.

What comes at the end of a URL-path being matched by mod_rewrite? Only the last character of that URL-path.

Jim

StaceyJ

8:51 pm on Jan 9, 2011 (gmt 0)

10+ Year Member



Thank you, I get the consistent but not identical thing, I was thinking too hard. :) And thanks for the example. Examples are worth a thousand words sometimes.

And your reply re the end of a URL path makes perfect sense, I don't know what I was thinking, or not thinking, as the case may have been. :)

I think we can put this one to rest (I hear a collective sigh of relief :)). I think I've got my brain wrapped around it all, except I still don't get where the cgi-bin.cgi was coming from, that still has me scratching my head. But I'll keep thinking on it and maybe it will finally hit me. Thank you, again!