Forum Moderators: phranque

Message Too Old, No Replies

Redirecting to index.html

         

Tonearm

3:23 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is the index page of my store:

www.mystore.com/cgi-bin/catalog/index.html

The following URLs also display the index page:

www.mystore.com/cgi-bin/catalog
www.mystore.com/cgi-bin/catalog/

I'm trying to redirect the bottom two to the top one with a 301 for the sake of link popularity. Does anyone know how I can do that in either Perl or in the .htaccess file (.htaccess is preferred) without causing an infinite loop?

oilman

3:30 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld Tonearm.

The reality is that all 3 of those links are the same - webservers are setup to look for certain files by default in a folder if you only type in the folder name (in your case cgi-bin/catalog).

When you type in yourdomain.com it automatically displays index.html, default.html, index.asp, index.cfm or whatever you have in there that your server is set up to display. The same rules apply to folders.

jdMorgan

3:40 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tonearm,

Have you tried this?


RewriteRule ^/?$ /index.html [R=301,L]

I've done it the other way, preferring to use [domain.com...] as the "standard" URL for the index page, but I haven't tried to do what you specified.

Jim

Tonearm

6:15 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys! jdMorgan- I want to make sure I understand the way the code you supplied works before I implement it. Can you explain? Does it take the "cgi-bin/catalog" portion into account?

jdMorgan

7:24 am on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tonearm,

Sorry - I wasn't clear. The rewrite rule I posted goes in www.yourdomain.com/cgi-bin/catalog/.htaccess.

I assumed that's where you'd want to put it since it appears that's where PERL usually is, too - A mental cross-influence from someone else's problem, I'm afraid.

If you want to put the rewrite in .htaccess at the site root, then you'd include all that other path stuff:


RewriteRule ^cgi-bin/catalog/?$ /cgi-bin/catalog/index.html [R=301,L]

Jim

Tonearm

5:39 pm on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks jdMorgan! The following in my .htaccess file in the cgi-bin worked like a charm:

RewriteEngine On
RewriteRule ^catalog/?$ /cgi-bin/catalog/index.html [R=301,L]
RewriteRule ^catalog//?$ /cgi-bin/catalog/index.html [R=301,L]

I was thinking, I use a lot of RedirectPermanent directives in my .htaccess file. Using MOD_REWRITE seems a lot cleaner. Is there any disadvantage to changing all of my RedirectPermanents to RewriteRules? Also, what does that "L" at the end mean? Thanks again!

jdMorgan

6:08 pm on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tonearm,

I don't know of any disadvantage to mod_rewrite, except for its complexity. The RedirectMatch directive, available in Apache 1.3 and later, adds the capability to handle standard regular expressions to the basic Redirect directive. However, mod_rewrite uses extended regular expressions, which make it more flexible.

The [L] at the end of a RewriteRule means, "this is the Last rule that needs to be processed for the request being handled." I suggest using it by default. Unless you really need to do a multi-stage rewrite using multiple RewriteRules, your should use [L]. Omitting the [L] when no further processing is needed makes your rewrites inefficient, and in some cases, leads to unexpected results. Note also that [L] is redundant if used with the [F] and [G] flags.

For the definitive answers on the use of mod_rewrite and these flags, see the Apache Mod_Rewrite documentation [httpd.apache.org].

Jim

Tonearm

6:27 pm on Dec 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sorry to bug you again, but is there a way to turn "catalog" into a variable something like this:

RewriteEngine On
RewriteRule ^{variable}/?$ /cgi-bin/{variable}/index.html [R=301,L]
RewriteRule ^{variable}//?$ /cgi-bin/{variable}/index.html [R=301,L]

I know I should delve into the MOD_REWRITE docs, and if I take this any further I will. This should be perfect though. Thank you!

jdMorgan

1:50 am on Dec 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tonearm,

I hope you've found the answer by now, but yes, you can "copy" a matched field in the pattern on the left side of the rule into the right side of the rule. This is called a back-reference. The back-reference fields are delimited by parentheses on the left, and then referenced by a dollar sign followed by a sequential back-reference number on the right. Multiple back-references are allowed - a fixed group order (i.e. 1-9) is assigned on the left, and may be backreferenced by $number in any order on the right. Backreferences to groups in an immediately-preceding RewriteCond are similarly available using "%" followed by a number.

The following rule in .htaccess in your site's root directory should work to rewrite yourdomain/catalog/ or yourdomain/catalog to yourdomain/cgi-bin/catalog/index.html:


RewriteRule ^([^/]+)/?$ /cgi-bin/$1/index.html [R=301,L]

Because of the backreference, it will also rewrite yourdomain/carbonaceous_chondrite/ or yourdomain/carbonaceous_chondrite to yourdomain/cgi-bin/carbonaceous_chondrite/index.html. In order to do this, the rule must be placed in .htaccess in the directory above your cgi-bin subdirectory.

I also want to give you a heads-up that the pattern in the following rule contains an illegal path, and should not work at all:


RewriteRule ^catalog[b]//[/b]?$ /cgi-bin/catalog/index.html [R=301,L]

I believe you will find that you can delete it with no change in function. The problem is that legal URLs never contain "//" in the path name. "//" is legal only to delimit the protocol and port from the domain, e.g. "http://domain". At the point in the server processing where mod_rewrite runs, the protocol and port have been stripped and moved to environment variables {SERVER_PROTOCOL} and {SERVER_PORT}, and so are not directly available to be pattern-matched inside a RewriteRule.

Jim

Tonearm

3:33 am on Dec 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks jdMorgan! I have to say though, when I only had:

RewriteRule ^catalog/?$ /cgi-bin/catalog/index.html [R=301,L]

in my .htaccess it would redirect the following properly:

www.mystore.com/cgi-bin/catalog

but not this:

www.mystore.com/cgi-bin/catalog/

When I have had both of the following in my .htaccess like this:

RewriteRule ^catalog/?$ /cgi-bin/catalog/index.html [R=301,L]
RewriteRule ^catalog//?$ /cgi-bin/catalog/index.html [R=301,L]

both of those URLs were forwarded properly. I really think that the double "//" is necessary for both URLs to be redirected. Is that alright?

jdMorgan

5:57 am on Dec 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tonearm,

I suspect another RewriteRule, or perhaps a DirectoryIndex directive, is interfering; These could be present in any .htaccess file in or above your cgi-bin subdirectory. Your results are "impossible" from a purely-theoretical viewpoint. "//" cannot exist in a valid pathname. So, it's a puzzle.

Please try placing the following plain-vanilla, fixed-pattern ruleset in /cgi-bin/.htaccess, and tell us what happens (exclude all other rewrites and any DirectoryIndex directives by commenting them out with a "#" at the beginning of the line):


Options +FollowSymlinks
RewriteEngine on
RewriteRule ^catalog/$ /cgi-bin/catalog/index.html [R=301,L]
RewriteRule ^catalog$ /cgi-bin/catalog/index.html [R=301,L]

Jim

Tonearm

11:44 pm on Dec 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, I've got this in my cgi-bin/.htaccess now:

RewriteEngine On
RewriteRule ^([^/]+)/?$ /cgi-bin/$1/index.html [R=301,L]
RewriteRule ^([^/]+)?$ /cgi-bin/$1/index.html [R=301,L]

and it's working for both of those URLs. Is this OK?

jdMorgan

12:41 am on Dec 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is this OK?

The first rule will match anything non-blank except a slash, followed by an optional slash.

The second rule will match any local path not containing a slash, but the "+" and "?" are at odds.
The "+" is requiring something (anything but a slash) in the parenthesized group. But the following question mark then makes the whole group optional. I'm not really sure which will win.

You could just as well try ^([^/]*)$ or ^([^/]+)$ depending on whether you want to redirect on a blank local path or not - i.e. whether you want to redirect the URL "yourdomain/cgi-bin"

What it all boils down to in the end, though, is whether it gets the job done... :)

Jim

Tonearm

9:05 pm on Dec 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since I implemented this, the spiders in my access_log files have been acting very strangely. They always hit robots.txt after almost every single file access and some other kooky behavior. Can you imagine why they would be behaving like this?