Forum Moderators: phranque

Message Too Old, No Replies

htaccess redirect changes between internal and external

         

Skier88

6:31 pm on Oct 27, 2010 (gmt 0)

10+ Year Member



My htaccess file does what it's supposed to, but there is one unintended effect that I can't trace. A request for "www.example.com/.htaccess" should be externally redirected to "example.com/.htaccess", then internally redirected to "example.com/_structure/403.html". However, both redirects are external - the url bar becomes "example.com/_structure/403.html". They both work as they should separately - "www.example.com/folder/page.html" becomes "example.com/folder/page.html", and "example.com/.htaccess" stays at the same url but displays "example.com/_structure/403.html".

Complete .htaccess file:

RewriteEngine on

RewriteCond %{HTTP_HOST} ^www\.sunriseinfo\.us($|:) [NC]
RewriteRule ^ http://sunriseinfo.us%{REQUEST_URI} [R=301,L]

ErrorDocument 403 /_structure/403.html
RewriteRule ^database/login_vars\.php$ - [F]
RewriteRule (/|^)php\.ini$ - [F]
RewriteRule (/|^)\.htaccess$ - [F]

ErrorDocument 404 /_structure/404.html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} \.(gif|jpg|jpeg|png)$
RewriteRule ^ _structure/404.gif [L]

RewriteRule !\.[a-z]{2,4}$ _structure/page.php [NC]


Any idea why this is happening? Thanks for looking.

jdMorgan

9:14 pm on Oct 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should never rewrite to an error page. If you do, the result will be a 200-OK response -- certainly not what you want to serve unless you want trouble of several kinds with search engines.

In addition, it is both a waste of time and resources and a security problem to redirect requests which will subsequently be denied.

Your rules should be ordered with all access controls first, followed by external redirects in order from most-specific patterns and conditions to least-specific, followed by all internal rewrites, again in order from most- to least-specific.

A "most-specific" rule affects one URL-path or a small number of URL-paths. A "least-specific" rule affects many or even all requested URL-paths. The domain canonicalization redirect is typically the least-specific --and therefore the last-- external redirect rule.

Your internal rewrite for missing images is inefficient, since you're making your server check the disk before being sure it's really necessary. There's a much more direct way to do this anyway

There are also several other tweaks to improve performance and 'correctness'.

I'd suggest:

# Declare custom 403-Forbidden error document
ErrorDocument 403 /_structure/403.html
#
# Declare default custom 404-Not Found error document
ErrorDocument 404 /_structure/404.html
#
# Override default custom 404-Not Found error document for missing images
<FilesMatch "\.(gif|jpe?g|png)$">
ErrorDocument 404 /_structure/404.gif
</FilesMatch>
#
RewriteEngine on
#
# Access controls - generate 403-Forbidden response for disallowed-resource requests
RewriteRule ^database/login_vars\.php$ - [F]
RewriteRule ^([^/]+/)*php\.ini$ - [F]
RewriteRule ^([^/]+/)*\.(htaccess|htpasswd|htgroup)$ - [F]
#
# Redirect to canonicalize only the "www" subdomain (and no others that might be requested)
# but redirect FQDN-format hostname and appended-port number "www" versions as well
RewriteCond %{HTTP_HOST} ^www\.sunriseinfo\.us(\.|\.?:[0-9]+)?$ [NC]
RewriteRule ^ http://sunriseinfo.us%{REQUEST_URI} [R=301,L]
#
# Alternate full-canonicalization rule:
# Redirect to canonical hostname unless requested hostname is exactly "sunriseinfo.us" or blank
# RewriteCond %{HTTP_HOST} !^(sunriseinfo\.us)?$
# RewriteRule ^(.*)$ http://sunriseinfo.us/$1 [R=301,L]
#
# Rewrite extensionless URL-path requests to the "/_structure.php" script
RewriteRule !\.[a-z]{2,4}$ /_structure/page.php [NC]


I commented your code. Unless your memory is perfect and your .htaccess-fu is very strong, I suggest that you write and keep accurate comments in your code. It will save you all manner of wasted time and hassles if well-done.

Jim

Skier88

7:50 pm on Oct 28, 2010 (gmt 0)

10+ Year Member



Thanks for the response Jim. .htaccess / apache is one of my weakest areas in web design, so I really appreciate the thorough advice.

I actually found out I can change some php and eliminate the need for a custom 404 error image - but that construct is useful nonetheless. I had thought you could only put rewriterules inside file selectors.

Why did you write the regexes to match the entire url? For example, you wrote "^(.*)$" instead of "^" and "^([^/]+/)*php\.ini$" instead of "(/|^)php\.ini$". It seems like this would just make the matching slower.

Also, I'm not quite sure what you're doing with the canonicalization. Why would there be a "." after the domain name?

I updated my code, but there are a few problems. First, the new code:

# ERROR DOCUMENTS
ErrorDocument 403 /_structure/403.html
ErrorDocument 404 /_structure/404.html

RewriteEngine on

# ACCESS CONTROLS
RewriteRule ^database/login_vars\.php$ - [F]
RewriteRule (/|^)php\.ini$ - [F]
RewriteRule (/|^)\.(htaccess|htpasswd|htgroup)$ - [F]

# EXTERNAL REDIRECTS
RewriteCond %{HTTP_HOST} ^www\.sunriseinfo\.us($|:) [NC]
RewriteRule ^ http://sunriseinfo.us%{REQUEST_URI} [R=301,L]

# INTERNAL REDIRECTS
RewriteRule !\.[a-z]{2,4}$ _structure/page.php [NC]


Yes, I know the comments are sparse at best ... it isn't really my style to put more than I need to remember what each thing does, and I doubt anybody else will be using this. Anyway, I want to change the last line to something more specific, eg:

RewriteRule !\.(gif|jpe?g|png|css|js)$ _structure/page.php [NC]

This will make it match some/all forbidden files. And the script that it redirects to ("_structure/page.php") displays the requested file (amongst other things), effectively circumventing my own security. How do I get .htaccess to stop redirecting a forbidden url? ([L] doesn't work)

Also on the subject of the last line, but a little bit off topic for the thread: the purpose of that line is to automatically wrap pages in a header and footer (which is done by the script). However, if the script is fed a directory it doesn't know which file is the index file, so it doesn't know what to display. I have a topic on this in the php forum, but I thought I'd ask you since you seem to be an apache expert - is there any chance the solution lies in .htaccess?

Thanks again for your help. I'm sorry if I sounded critical at times; I just want to understand what you already know.

g1smd

8:22 pm on Oct 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A dot after domain name is perfectly valid, but is non-canonical.

You would want to redirect for that, and/or for trailing appended port number.

jdMorgan

12:13 am on Oct 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> However, if the script is fed a directory it doesn't know which file is the index file, so it doesn't know what to display.

The script needs to be changed to look at the Request-URI and if it is blank, replace it with your usual index page path name. If these index page names are inconsistent across your directories then there's really no good solution, as the only thing you can do is to 'search' for possible index pages, and that would be very inefficient.

Your unanchored patterns search forward while comparing the strings on a character-by-character basis. The anchored patterns I posted search forward by doing checks only after each "boundary" character is encountered.

Where I used ^(.*)$ it was to create a back-reference for use as $1. This is a local variable, one of nine, instead of system variable, one of hundreds.

Jim

Skier88

2:53 am on Oct 29, 2010 (gmt 0)

10+ Year Member



Thanks for the replies. Oddly enough, after I put in the alternate canonicalization rule it stopped redirecting 403 pages to page.php ... I have no idea why, but at least it works. So I'd say now the file is fixed - thank you for your input.

Regarding the regex, I see the reason for the backreference. But does it really scan from the start checking every character when there is an end anchor tag? It seems like it should be able to process it backwards, given that it is a very simple regex.

As for the script, maybe I will limit the site to one type of index page. I was just hoping these was some way php could access / implement the behavior of a http request.