Forum Moderators: phranque

Message Too Old, No Replies

Removing .cgi from URL and Internal Rewrite

         

StaceyJ

10:02 pm on Dec 14, 2010 (gmt 0)

10+ Year Member



I don't post here often but read a LOT. I think I have all the information I need, but wanted to ask a question to make sure. The site has static URL's pointing to a perl script in the cgi-bin such as:

http://www.example.com/cgi-bin/shop/red-widgets.cgi

which then generates the remainder of the URL:

http://www.example.com/cgi-bin/shop/red-widgets.cgi/red-widget-1-thumbnail-page-of-options/red-widget-1-product-page

I want to accomplish two things.

1. Remove .cgi from the URL

2. Remove /cgi-bin/shop from the URL

Is my best practice to just change all the static links from:

http://www.example.com/cgi-bin/shop/red-widgets.cgi

To:

http://www.example.com/red-widgets

And then use the following in .htacess (thanks to jdMorgan):

Options +FollowSymLinks -Indexes -MultiViews
RewriteEngine on

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.]+\.cgi\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.cgi$ http://www.example.com/$1 [R=301,L]

RewriteRule ^(([^/]+/)*[^./]+)$ /$1.cgi [L]

Or am I totally out to lunch and brain dead on this after reading and reading and reading and trying to make sure I know what I want to accomplish?

Thank you!

jdMorgan

3:41 pm on Dec 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmm... depends -- sort of.

In the simplest case, you could/should include the cgi-bin/shop/ URL-prefix in both patterns in both lines of your first rule to make that rule more-specific and more efficient.

I believe that in your second rule, adding cgi-bin/shop/ to the substitution path will be required in order to deliver the request to the proper script filepath. (Basically, the first rule "takes out" cgi-bin/shop/ from requested URLs, but the second rule needs to "put it back in" to get to the right filepath.)

Jim

StaceyJ

6:24 pm on Dec 16, 2010 (gmt 0)

10+ Year Member



Thanks Jim for the suggestions! I knew it couldn't be that easy no matter how much I looked at it. Back to the drawing board.

And I typo'd, too. It looks like if I leave .cgi out of the URL's I need to change things even more. I may need to color my hair soon, if I have any left.

StaceyJ

7:32 pm on Dec 20, 2010 (gmt 0)

10+ Year Member



So getting back to this I think I have it down. If I change on page links from:

http://www.example.com/cgi-bin/shop/red-widgets.cgi

To:

http://www.example.com/red-widgets

I would internally rewrite these like this:

RewriteRule ^(([^/]+/)*[^\./]+)$ $1/cgi-bin/shop.cgi [L]
#my notes to help me understand what it says (match any character that is not a / 1 or more times, followed by a /) and repeat all of this 0 or more times (doesn't have to repeat but can), followed by any character that is not a . or a / 1 or more times, and store it all in a variable.

Question, since there are to sets of parenthesis (nested), will that create two variables, or is it just for grouping?
If it does create two, which one is $1 and which is $2. Sorry if that's a stupid question. I never really understood that.

I'd also need to externally redirect the old URLs that might be out there like this:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*cgi-bin/shop\.cgi\ HTTP/
RewriteRule ^([^/]+/)*cgi-bin/shop\.cgi$ http://www.example.com/$1 [R=301,L]

And the Redirect rule goes above the Rewrite rule.

Am I on the right track? Thanks !

Edit - I had left out the / in the first rewrite rule before cgi-bin

[edited by: StaceyJ at 7:50 pm (utc) on Dec 20, 2010]

g1smd

7:40 pm on Dec 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To find the backreference number for $x, count the left parentheses. You only need $1 here.

Add a leading slash before the $1 in the rewrite, to avoid an obscure security issue.

Yes, redirects go before rewrites.

StaceyJ

8:12 pm on Dec 20, 2010 (gmt 0)

10+ Year Member



I made a mistake and left out part of the path in all of this. It shouldn't be cgi-bin/shop.cgi, it should be cgi-bin/shop/red-widgets.cgi. And then there are also blue-widgets.cgi and green-widgets.cgi. Can I just group the [^\./]+ part like this ([^\./]+) and use that as variable $3 so the whole thing would be

RewriteRule ^(([^/]+/)*([^\./]+))$ /$1/cgi-bin/shop/$3 [L]

Thanks about the variable backreference!

As for the / before the $1 for security, ok, but can you please elaborate?

StaceyJ

11:35 pm on Dec 20, 2010 (gmt 0)

10+ Year Member



Actually I think I need to remove the outer () altogether now and change it to this:

RewriteRule ^([^/]+/)*([^\./]+)$ /$1cgi-bin/shop/$2 [L]

Or shall I throw in the towel and raise the white flag?

jdMorgan

12:45 am on Dec 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should test, observe, and learn... :)

RewriteRule ^(([^/]+/)*)([^\./]+)$ /$1cgi-bin/shop/$3 [L]

Without the "outer" parens around the entire first sub-pattern, only the last (lowest) level of any multi-level requested directory-path would be 'remembered' -- and that would likely cause a big mess...

Jim

StaceyJ

3:27 pm on Dec 21, 2010 (gmt 0)

10+ Year Member



I am observing, and learning, thank you. :) I'm just not testing until after the holiday, but I want to be ready to go. :)

So I think we have this part down, although we forgot to add back .cgi to the filepath, so it should be like this

RewriteRule ^(([^/]+/)*)([^\./]+)$ /$1cgi-bin/shop/$3.cgi [L]

Now I have another question after looking at this again. Since cgi-bin always comes right after www.example.com (at least on this site), is it necessary to test for 0 or more occurrences like this (([^/]+/)*) and couldn't it just be ([^/]+/) , or am I confusing myself?

StaceyJ

4:27 pm on Dec 22, 2010 (gmt 0)

10+ Year Member



Ok, so I've totally confused myself now, and maybe others, after I thought we had it. Since the new URLs are going to look like this

http://www.example.com/red-widgets

the GET request is going to look like this

/red-widgets

is it not? So testing for one or more characters that aren't a / followed by a / isn't going to accomplish anything, is it? So how do I test for that? If I do this ([^\./]+) and the request is /help.php, won't that match "help" also in help.php? And one other thing, there may not always be - so we can't test for two character strings separated by a - . So can I do this?

RewriteRule ^/([^\./]+)^.$ /cgi-bin/shop/$1.cgi [L]

I really appreciate the help so far and I've learned a ton, but I still have a ways to go before I stop asking stupid questions, so please bear with me. And fyi, I'm not copying and pasting from other threads with no clue what it means (no one suggested that, I just wanted to point it out), I'm reading and then writing all this on my own (which is probably why it's so messed up).

StaceyJ

7:01 pm on Dec 22, 2010 (gmt 0)

10+ Year Member



So I bit the bullet and tested it. This works in the root directory

RewriteRule ^([^\./]+)$ /cgi-bin/store/$1.cgi [L]

It takes /red-widgets and rewrites it to /cgi-bin/shop/red-widgets.cgi and displays the page just fine. And when I comment it out I get a 404, which is what should happen so far.

And it didn't match "help" in help.php.

Now the problem is this. This doesn't work in the root directory

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^\./]+\.cgi(.*)\ HTTP/
RewriteRule ^cgi-bin/store/([^\./]+)\.cgi(.*)$ http://www.example.com/$1$2 [R=301,L]

It has to be changed to this and put in the /store/ directory inside the cgi-bin

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^\./]+\.cgi(.*)\ HTTP/
RewriteRule ^([^\./]+)\.cgi(.*)$ http://www.example.com/$1$2 [R=301,L]

It works there, BUT! Where do I put the rewrite now? The rewrite is supposed to go after the redirect, but it won't work in the /store/ directory, it needs to go in the root. Very close to throwing in the towel. :(

StaceyJ

10:24 pm on Dec 22, 2010 (gmt 0)

10+ Year Member



Here's a test I did.

This redirect works in a .htaccess in the /cgi-bin/store/ directory

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /cgi-bin/store/[^\./]+\.cgi(.*)\ HTTP/
RewriteRule ^([^\./]+)\.cgi(.*)$ http://www.example.com/$1$2 [R=301,L]

This rewrite works in a .htaccess in the root for the above request and gives a 301 and then a 200, but server error log shows a file doesn't exist for /red-widgets (I know the file doesn't exist, but doesn't the rewrite take care of that? *confused*)

RewriteRule ^([^\./]+)$ /cgi-bin/store/$1.cgi [L]

But after the first redirect and rewrite, the next level of redirects display in the address bar, but then we get 404 errors.

e.g.

http://www.example.com/cgi-bin/shop/red-widgets.cgi

To

http://www.example.com/red-widgets

Works, both the redirect and rewrite and we get a nice page.

But either

http://www.example.com/cgi-bin/shop/red-widgets.cgi/fuzzy-red-widgets

OR

http://www.example.com/red-widgets/fuzzy-red-widgets

Throws a 404

There is no physical filepath after red-widgets.cgi to rewrite to, the script generates the rest.

At this point I am totally lost, my brain is mush and I need a drink or three.

Happy holidays all, if someone feels the need to shoot me in the head I wouldn't blame you right now.

jdMorgan

1:51 am on Dec 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If it is possible to invoke a redirect after an internal rewrite, then the result will always be exposure of your internal filepaths as URLs to the browser (and to search engines -- bad news!).

However, since your 'remove /cgi-bin/store/' redirect checks for THE_REQUEST, it does not matter that it is in a lower-level .htaccess file, since it will never be invoked following the top-level .htaccess 'widgets-to/cgi-bin/store' rewrite. Test it and see... :)

Actually, to help you finish projects in a timely manner despite our sporadic response postings here, that's good advice generally -- Instead of asking "what if" questions here, test it and find out -- That will likely give you a much faster answer.

It may well also give you more confidence. After all, what differentiates the "beginners" from the "experts" here is -for the most part- that the experts have 'blown up' their servers many more times and in many more ways than the beginners, and simply remember how *not* to do so quite so often...

Jim

StaceyJ

4:51 pm on Dec 26, 2010 (gmt 0)

10+ Year Member



Actually, to help you finish projects in a timely manner despite our sporadic response postings here, that's good advice generally -- Instead of asking "what if" questions here, test it and find out -- That will likely give you a much faster answer.

I did test it, several different ways, which is how I came to post the previous two posts showing what happened. However, since I don't have the luxury of a test server or anything like that, I have to test on a live site, and I'm hesitant to just throw stuff out there, which is why I ask questions here, which I thought was the purpose of this forum. I understand loads of questions have been asked a million times and people are tired of answering them, so if someone can point me to a post that covers my problem please point me to it. I turn up nothing no matter what I search on.

If it is possible to invoke a redirect after an internal rewrite, then the result will always be exposure of your internal filepaths as URLs to the browser (and to search engines -- bad news!).

I understand this from all the reading I have done here, but this didn't happen, no matter how crazy you may think I am. It displayed the URL.

However, since your 'remove /cgi-bin/store/' redirect checks for THE_REQUEST, it does not matter that it is in a lower-level .htaccess file, since it will never be invoked following the top-level .htaccess 'widgets-to/cgi-bin/store' rewrite. Test it and see... :)

I understand this also, but the redirect did in fact happen, and it happened with the correct response codes as posted above and these were observed using Live HTTP Headers for FF.

I'm thinking maybe the reason none of this makes logical sense is what you pointed out to me previously in this topic [webmasterworld.com], which is why my redirects that include cgi-bin don't work when I put them in the root. They must be in the cgi-bin or below to work, and they do work fine that way, as I've had several there before I started trying to tackle this, but I was not trying to do a rewrite to a different path with those, they are just a plain redirect.

So back to my question, which I cannot for the life of me figure out, how do I get the redirects to work along with the rewrites when the redirects have to be in the cgi-bin or below and the rewrites have to be in the root? I'm up against a brick wall, no matter what I try to think of I come up blank. And searching here and out on the web turns up nothing. Anyone care to try to point me in the right direction?

Thank you.

g1smd

5:19 pm on Dec 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



However, since I don't have the luxury of a test server or anything like that

Set up a test or dev sub-domain on your server, or install Apache (and mySQL, PHP) on a local PC.

You'll never look back. It took me about 30 iterations to get some code working right a few days ago. No danger to the site, it was all done on the test sub-domain.

jdMorgan

4:48 pm on Jan 5, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "make a test server on an old PC" or "create testing subdomains" options are the best, but another option is to create a few 'fake' subdirectories for testing purposes on the live server, modify the paths in the code to use those subdirectory names while getting the bulk of the debugging done, and then change them back once everything that can be tested in the test subdirectories has been debugged.

Again, because the redirect rule examines THE_REQUEST, the rule order should be a non-issue.

Jim

StaceyJ

5:15 pm on Jan 5, 2011 (gmt 0)

10+ Year Member



Thanks to both of you for your suggestions. I went the "fake subdirectory" route as it seemed simpler and didn't cost extra to add a subdomain. Since it's not my site and I'm helping a friend I'm cautious about spending their money.

Anyway, I figured out a lot of it by trial and error and actually got some help from tech support even though they have a policy of not helping with rewrites. There's still some odd stuff going on that I'm not sure can't be solved without moving to a VPS, but that's a different story.

Regarding it all though, I'm now stuck with a strange 404 as described in this topic [webmasterworld.com]. Hope it's ok to link to that, I don't want to post it all over again in two different places.

I greatly appreciate your help and suggestion!