Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite causing infinite loops

         

Readie

6:31 am on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wrote about this question a while ago but got advice more aimed at the newbie mistakes I made in htaccess than at the problem at hand.

When a user types in a nonsense URL, like example.com/nheojkleg/eheh they get a 404, however, when they type in a nonsense URL that contains one of my actual pages:

example.com/roster/nedkbge

I get a 500 error.

The error logs, with back-tracing show the following:
[Wed Feb 17 00:59:02 2010] [error] [client <SNIP>] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
[Wed Feb 17 00:59:02 2010] [debug] core.c(3063): [client <SNIP>] r->uri = /roster/rawr.php.php.php.php.php.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr.php
[Wed Feb 17 00:59:02 2010] [debug] core.c(3069): [client <SNIP>] redirected from r->uri = /roster/rawr

My htaccess file in it's entirety is as follows:

<IfModule mod_rewrite.c>

Options +FollowSymLinks
RewriteEngine on

RewriteRule ^$ /index.php [L]

RewriteRule ^roster/management$ /roster.php?rankfilter=management [L]

RewriteRule ^forums/$ /forums/index.php [L]
RewriteRule ^forum$ /forums [R=301,L]

RewriteRule ^archive/([^/]+)/?$ /archive.php?id=$1 [L]
RewriteRule ^archiveitem/([^/]+)/?$ /archive/$1 [R=301,L]
RewriteRule ^archiveitem$ /archive [R=301,L]

RewriteRule ^admin/account/([^/]+)/?$ /admin.php?page=account&logid=$1 [L]
RewriteRule ^admin/adminmanage/([^/]+)/?$ /admin.php?page=adminmanage&muser=$1 [L]
RewriteRule ^admin/newsedit/([^/]+)/?$ /admin.php?page=newsedit&selid=$1 [L]
RewriteRule ^admin/adminnewsedit/([^/]+)/?$ /admin.php?page=adminnewsedit&selid=$1 [L]
RewriteRule ^admin/lostpass/([^/]+)/?$ /admin.php?page=lostpass&lostid=$1 [L]
RewriteRule ^admin/([^/]+)/?$ /admin.php?page=$1 [L]

RewriteRule ^(.+)/$ ht tp://%{HTTP_HOST}/$1 [R=301,L]

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ /$1.php [L]

</IfModule>

ErrorDocument 400 /400page.php
ErrorDocument 401 /401page.php
ErrorDocument 403 /403page.php
ErrorDocument 404 /404page.php
ErrorDocument 500 /500page.php

(is no space in the http for HTTP_HOST - just preventing it from parsing the URL here)

I would be very grateful if anyone could explain why this is happening, or better yet, how to stop it from happening.

g1smd

9:48 am on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First up, list all of the redirects before all of the rewrites.

Make sure that all redirects also specify protocol and domain name in the target of the rule.

The general non-www to www redirect must be the last rule in the redirect list.

List all of the rewrites after the last of the redirects.

Your rewrites that allow the requested URL to have an optional trailing slash promote duplicate content. Set these so that only a URL request without a trailing slash will trigger the rewrite. Add a redirect before those rewrites, such that if the URL request contains a trailing slash, then the user is redirected to the URL without trailing slash.

Fixing these things might not clear your particular error but will fix other errors that you haven't yet noticed.

I don't see anything in your .htaccess file that would cause that particular logged error.

jdMorgan

2:45 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suspect that you've got another module interfering with your mod_rewrite "exists-check" code. Try

Options +FollowSymLinks [b]-MultiViews[/b]
# On Apache 2.0+ only:
AcceptPathInfo off

You may also wish to replace your "RewriteRule ^$ /index.php [L]" rule with the "more-standard"

DirectoryIndex /index.php

I concur with g1smd's assessment of rule-ordering and URL canonicalization above. As it stands, there are serious duplicate-content issues, and an almost 100% chance that your internal script filepaths will be exposed to clients (browsers and search engines) as URLs. That will result in "wrong" URLs listed in searches for your targeted market, and competition between this "wrong" URL and the correct URL for ranking and for incoming links.

Jim

Readie

4:33 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thankyou for the replies and advice :)

When you talk about the trailing slash/non trailing slash re-writes g1smd - Am I right in thinking:

RewriteRule ^archive/([^/]+)/?$ http://www.example.com/archive/$1 [R=301,L]
RewriteRule ^archive/([^/]+)?$ /archive.php?id=$1 [L]


Is the correct way to phrase it?

jdMorgan

4:37 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remove the question mark preceding the "$" in both patterns.

If un-escaped, a question mark in a regular-expressions pattern means, "Match zero or one of the preceding character, square-bracket-delimited alternate character group, or parenthesized sub-expression." In essence, it makes whatever it follows optional.

Having made this change, be sure that none of the "/archive" links on your own site include the trailing slash.

Jim

Readie

4:41 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ahh, I see. Thank you for your time and the quick response :)

g1smd

6:59 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, with the new code as per jd's suggestion, 'with-slash' is redirected to 'without-slash', and 'without-slash' will serve content. :)

Readie

9:55 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right, implemented the changes (hasn't fixed the issue with the 500's, so I'll speak to the server owner about what other modules he's got on) and everything is working fine except for one detail.

RewriteRule ^/forums$ /forums/index.php [L]


Is no longer working (throws up a "page not found" (Doesn't throw up 404 though)), I got the actual link working again by doing

RewriteRule ^forums$ http://www.example.com/forums/index.php [R=301,L]


Just wondering if this is normal etc? They're PhPBB3 forums

jdMorgan

11:10 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Take that "R=301" out of the flags and remove the protocol and domain name from that rule as soon as you read this.

You've just bought yourself a whole load of trouble by using an external redirect to 'fix' this problem... It should read:

RewriteRule ^forums$ /forums/index.php [L]

The only problem with your previous rule was that the pattern should not start with a slash unless this code goes into the server configuration file outside of any <Directory> container. With the leading slash on the pattern, that rule would never work in .htaccess.

Now you may need to add yet another rule to recover from this error and its (negstive) effects on your search ranking:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.php(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

Also, put all of your external redirects first, in order from most-specific to least-specific, followed by all of your internal rewrites, again in order from most- to least-specific. Most specific means "affecting one or only a few URLs," while least specific means, "affecting many URLs."

Jim

jdMorgan

11:14 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]

This rule, if still present in the file, causes an infinite loop by redirecting any non-blank URL to itself repeatedly, until the server or the client gives up.

You may want to delete that rule, sort out the rule order as described above, and re-post your code -- there are several more improvements needed, but I don't want to distract you from the main purpose, which is to find the source of the 500-Server Error.

Jim

Readie

11:50 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Fortunatley the forums are not active yet, and are robots.txt denied at the moment (still setting them up, using off-site old ones etc).

Removing the 301 rule and changing to ^forums$ has caused going to /forums to require activley typing in the /forums/index.php again

The .htaccess file now is as follows:

<IfModule mod_rewrite.c>

Options +FollowSymLinks -MultiViews
AcceptPathInfo off
RewriteEngine on

RewriteRule ^index/$ http://www.example.com [R=301,L]
RewriteRule ^index$ http://www.example.com [R=301,L]

RewriteRule ^roster/officers/$ http://www.example.com/roster/officers [R=301,L]

RewriteRule ^forums/$ http://www.example.com/forums [R=301,L]
RewriteRule ^forum/$ http://www.example.com/forums [R=301,L]
RewriteRule ^forum$ http://www.example.com/forums [R=301,L]

RewriteRule ^archiveitem/([^/]+)/$ http://www.example.com/archive/$1 [R=301,L]
RewriteRule ^archiveitem/([^/]+)$ http://www.example.com/archive/$1 [R=301,L]
RewriteRule ^archiveitem/$ http://www.example.com/archive [R=301,L]
RewriteRule ^archiveitem$ http://www.example.com/archive [R=301,L]
RewriteRule ^archive/([^/]+)/$ http://www.example.com/archive/$1 [R=301,L]

RewriteRule ^admin/account/([^/]+)/$ http://www.example.com/admin/account/$1 [R=301,L]
RewriteRule ^admin/adminmanage/([^/]+)/$ http://www.example.com/admin/adminmanage/$1 [R=301,L]
RewriteRule ^admin/newsedit/([^/]+)/$ http://www.example.com/admin/newsedit/$1 [R=301,L]
RewriteRule ^admin/adminnewsedit/([^/]+)/$ http://www.example.com/admin/adminnewsedit/$1 [R=301,L]
RewriteRule ^admin/lostpass/([^/]+)/$ http://www.example.com/admin/lostpass/$1 [R=301,L]
RewriteRule ^admin/([^/]+)/$ /admin/$1 [R=301,L]

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

DirectoryIndex /index.php [L]

RewriteRule ^roster/officers$ /roster.php?rankfilter=officers [L]

RewriteRule ^forums$ /forums/index.php [L]

RewriteRule ^archive/([^/]+)$ /archive.php?id=$1 [L]

RewriteRule ^admin/account/([^/]+)$ /admin.php?page=account&logid=$1 [L]
RewriteRule ^admin/adminmanage/([^/]+)$ /admin.php?page=adminmanage&muser=$1 [L]
RewriteRule ^admin/newsedit/([^/]+)$ /admin.php?page=newsedit&selid=$1 [L]
RewriteRule ^admin/adminnewsedit/([^/]+)$ /admin.php?page=adminnewsedit&selid=$1 [L]
RewriteRule ^admin/lostpass/([^/]+)$ /admin.php?page=lostpass&lostid=$1 [L]
RewriteRule ^admin/([^/]+)$ /admin.php?page=$1 [L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ /$1.php [L]

</IfModule>

ErrorDocument 400 /400page.php
ErrorDocument 401 /401page.php
ErrorDocument 403 /403page.php
ErrorDocument 404 /404page.php
ErrorDocument 500 /500page.php

Readie

11:55 pm on Feb 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A flash of insight, removing the rule:

RewriteRule ^forums/$ http://www.example.com/forums [R=301,L]

Stopped it giving me the error page: yet now it's going straight to /index.php rather than /forums/index.php despite the

RewriteRule ^forums$ /forums/index.php [L]

g1smd

12:37 am on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Remove [L] from DirectoryIndex

There's one redirect missing protocol and domain name in the target.

g1smd

12:44 am on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^index/$ http://www.example.com [R=301,L]
RewriteRule ^index$ http://www.example.com [R=301,L]

can be combined as:
RewriteRule ^index[b]/?[/b]$ http://www.example.com[b]/[/b] [R=301,L]


RewriteRule ^forum/$ http://www.example.com/forums [R=301,L]
RewriteRule ^forum$ http://www.example.com/forums [R=301,L]

can be combined as:
RewriteRule ^forum[b]/?[/b]$ http://www.example.com/forums [R=301,L] 


RewriteRule ^archiveitem/([^/]+)/$ http://www.example.com/archive/$1 [R=301,L]
RewriteRule ^archiveitem/([^/]+)$ http://www.example.com/archive/$1 [R=301,L]

can be combined as:
RewriteRule ^archiveitem/([^/]+)[b]/?[/b]$ http://www.example.com/archive/$1 [R=301,L]


RewriteRule ^archiveitem/$ http://www.example.com/archive [R=301,L]
RewriteRule ^archiveitem$ http://www.example.com/archive [R=301,L]

can be combined as:
RewriteRule ^archiveitem[b]/?[/b]$ http://www.example.com/archive [R=301,L] 



For the last two rules above (of the four above) I would also omit the trailing $ symbol. This would allow the redirect to occur even when 'junk' was appended after the slash. The junk would be thrown away in the redirect.

jdMorgan

1:12 am on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's shorten this up and optimize it a bit...

DirectoryIndex /index.php
#
Options +FollowSymLinks -MultiViews
AcceptPathInfo off
RewriteEngine on
#
# Externally "index" requests to "/"
RewriteRule ^index/?$ http://www.example.com/ [R=301,L]
#
# Externally redirect old/incorrect URLs to new URLs
RewriteRule ^forum(s/|/?)$ http://www.example.com/forums [R=301,L]
RewriteRule ^archiveitem(/[^/]+)?/?$ http://www.example.com/archive/$1 [R=301,L]
#
# Externally redirect to remove trailing slashes
RewriteRule ^admin/([^/]+)/$ /admin/$1 [R=301,L]
RewriteRule ^admin/((account|adminmanage|newsedit|adminnewsedit|lostpass)/[^/]+)/$ http://www.example.com/admin/$1 [R=301,L]
RewriteRule ^archive/([^/]+)/$ http://www.example.com/archive/$1 [R=301,L]
RewriteRule ^roster/officers/$ http://www.example.com/roster/officers [R=301,L]
#
# Externally redirect non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite various entensionless page requests to scripts
RewriteRule ^roster/officers$ /roster.php?rankfilter=officers [L]
RewriteRule ^forums$ /forums/index.php [L]
RewriteRule ^archive/([^/]+)$ /archive.php?id=$1 [L]
#
RewriteRule ^admin/account/([^/]+)$ /admin.php?page=account&logid=$1 [L]
RewriteRule ^admin/adminmanage/([^/]+)$ /admin.php?page=adminmanage&muser=$1 [L]
RewriteRule ^admin/newsedit/([^/]+)$ /admin.php?page=newsedit&selid=$1 [L]
RewriteRule ^admin/adminnewsedit/([^/]+)$ /admin.php?page=adminnewsedit&selid=$1 [L]
RewriteRule ^admin/lostpass/([^/]+)$ /admin.php?page=lostpass&lostid=$1 [L]
RewriteRule ^admin/([^/]+)$ /admin.php?page=$1 [L]
#
# Internally rewrite all remaining requests to same-named php scripts if
# requested URL-path does not end with a slash, does not already end with
# a filetype, and does not resolve to existing directory, but does resolve
# to an existing php script when ".php" is appended
RewriteCond $1 !\.[^/.]+$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*[^/])$ /$1.php [L]
#
# Declare custom error documents
ErrorDocument 400 /400page.php
ErrorDocument 401 /401page.php
ErrorDocument 403 /403page.php
ErrorDocument 404 /404page.php
ErrorDocument 500 /500page.php

The <IfModule> container is unnecessary unless you want the mod_rewrite code to fail silently when mod_rewrite is not available on a server. The new exclusions added to the last rewriterule may make a noticeable difference in your server performance, due to a greatly-reduced number of calls to the operating system to check the disk to see if directories and/or files exist.

Jim

Readie

4:16 am on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow, I certainly wasn't expecting that much help :) Thank you very much.

Something (I suspect the
RewriteCond $1 !\.[^/.]+$
for the generic -(php)) has also fixed the 500 errors I was getting - a dodgy URL containing an existing page name now correctly 404s

Thank's again :)

Readie

7:42 am on Feb 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh, and to fix the /forums not working properly - I created a .htaccess within /forums with the single line:

DirectoryIndex /forums/index.php

And it sorted it nicely.