Forum Moderators: phranque

Message Too Old, No Replies

A Commented 'Pseudo-Working' htaccess File

         

jd01

2:20 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



# Remove Directory Indexing --- Effects all Directories
Options -Indexes

# Turn on Mod_Rewrite
RewriteEngine on

# Forbid access to .htaccess file --- Effects all Requests
RewriteRule \.htaccess - [F]

# Ends in index.html to /
# Effects all Requests
# Redirects to www, so L is added to prevent double check/redirect from the following ruleset
# Code for example only (IOW untested)
# Redirects EXTERNALLY should be canonical URL, with 'FLAG'='REASON-CODE'

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+)+)/index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1/ [R=301,L]

# Redirect ANYTHING.domain.com to www.domain.com, with the exception of sub.domain.com
# Effects all Requests to All Domains and Subs within Host if
# 1.) 'HOST' is set.
# 2.) If 'HOST' is not (sub OR www).domain.com
# Redirects EXTERNALLY should be canonical URL, with 'FLAG'='REASON-CODE'
# Redirects to www. L is added to prevent check from the following ruleset prior to redirecting to www
# Could be used without the L flag for efficiency if sub.domain.com is effected by all subsequent rulesets also.
# L flag used to prevent unwanted application of rules to sub.domain.com
# Extra 'back references' () on sub and www for visual purposes only, may not be necessary in a given situation.

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^((sub)¦(www)\.example\.com) [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# Rewrite all requests for index.html INTERNALLY to /index.php
# Rewrites INTERNALLY, should be /local/url/path.file in the .htaccess file
# Effects all requests ending in index.html, including internal requests for URLs ending in /

RewriteRule index\.html$ /index.php [L]

# 'Old Directory' Pages to 'New Directory' Pages
# Effects all Requests to 'old-dir1' AND 'old-dir2' but not 'old-dir' OR 'old-dir3'
# Both Redirect EXTERNALLY, so they should be a canonical URL, with 'FLAG'='REASON-CODE'

# 1.) Page, Number, .html are stored as a 'back reference' '$1' in the format 'pageNUM.html'
# 2.) To keep page names the same, $1 is used rather than a rule for each page.
# 3.) Will externally REDIRECT 'old-dir1' to 'new-dir-name' without effecting other requests, or changing page name.
# Redirects to www. L is added to prevent redirect from the preceding ruleset (non-www ruleset) when new request is sent.

RewriteRule ^old-dir1/(page[0-9]+\.html)$ http://www.example.com/new-dir-name/$1 [R=301,L]

# 1.) Page only is stored as a 'back reference' '$1' in the format 'anyLETTER'
# 2.) Page Number and .html are stored as a 'back reference' '$2' in the format 'ANYNUMBER.html'
# 3.) Will externally REDIRECT 'old-dir2' to 'page-name/current-page-number.html' without effecting other requests.
# 4.) Will 'move' www.example.com/old-dir2/page-nameNUM.html to www.example.com/page-name/NUM.html
# 5.) Will 'change' the 'page' name from page-nameNUM.html to NUM.html
# 6.) NC 'No Case' flag added to match all a-z A-Z requests.

RewriteRule ^old-dir2/([a-z]+)([0-9]+\.html)$ http://www.example.com/$1/$2 [NC,R=301,L]

Thought this might give some insight the reasons behind some of the 'Flag' usage, rule structure in .htaccess files. Also has some good, working (or near working) examples of code for starting points.

Justin

jdMorgan

3:16 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Comments:

The pattern "(([^/]+)+)/" in the RewriteCond for rule #2 requires one and only one subdirectory level, and that subdirectory level must be non-blank. The patterns "(([^/]+/)*)" or "(([^/]*/)*)" would probably be better for most users.

The pattern in the RewriteCond for rule #3 doesn't look right for the given description. A pattern of "!^(sub¦www)\.example\.com" seems more appropriate. The original pattern would allow the redirect only if the hostname were either "sub" or "www.example.com" -- i.e. the outer closing paren is misplaced, while the inner parentheses around "sub" have no effect whatsoever. Also, omitting the [NC] flag would allow canonicalization of mis-capitalized hostnames.

Added:

If avoiding stacked (multiple) redirects while fixing up URLs is a priority, a useful technique is to "flag and defer" -- Something like this trivial example:


# Canonicalize index.html URL-paths to "/"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.html\ HTTP/
RewriteRule index\.html$ /%1/ [E=Redir:Yes]
#
# Fix non-canonical domain names
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^(sub¦www)\.example\.com
RewriteRule .* - [E=Redir:Yes]
#
# Fix trailing period on domain
RewriteCond %{HTTP_HOST} ^www\.example\.com\.
RewriteRule .* - [E=Redir:Yes]
#
# ... More fix-ups
#
# Do external redirect if requested by any rule(s) above
RewriteCond %{ENV:Redir} ^Yes$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

One caution: There is a known bug [archive.apache.org] in Apache 1.x mod_rewrite that can cause severe problems with multi-step rewrites under certain conditions [example [webmasterworld.com]] -- Add fix-up routines one-at-a-time, and test thoroughly. There are several work-arounds for the Apache 1.3x bug, but all of them are very ugly.

Jim

jd01

4:30 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think we are saying the same thing, except I parenthasesized the pattern with 'extra possibilities'...

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^((sub)¦(www)\.example\.com) [NC]
RewriteRule (.*) http://www.example\.com/$1 [R=301,L]

Sorry... wanted to let people know there was more than one way to parenthesize the pattern, so people would know they could use different backreferences in the string if they needed to EG (whole string is OK), (or-exp¦is-OK), (an-exp)¦is-OK.

But I think if you remove the extra set(s) of () we are saying the same thing. It would allow access to sub.domain.com OR www.domain.com without a redirect, but would redirect everything except sub.domain.com to www.domain.com.

Am I missing something?

Good point on the NC... Thanks.

Justin

I'll just keep editing it.

jd01

4:52 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's one for you:

# Domain 'wild cards' are on... set to 'proxy' image requests from img. to www. to allow a higher number of connections per user/browser.

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^img\.example\.com
RewriteCond %{REQUEST_URI} \.gif¦\.jpg$
RewriteRule (.*) http://www.example.com/img/$1 [P,L]

Set to P = 'File Not Found'
Set to PT (I believe it's intended to be used with mod_alias) = 302 Found, 400 Bad Request.
Set to R=301 = Redirects to the correct location.

The img. ruleset is above the non-www redirect, flagged last, so img. requests should not be effected by any other rule(sets), only the proxied URL from www.example.com should be effected by subsequent rulesets.

I know I am passing the correct location from the redirect.
Do I need to PT and use mod_alias?
If so can you point me in a direction?

Do I need to have a server setting adjusted to allow the proxy?
I use F for blocking access, so if it is my prior .htaccess ruleset/blocking I should not receive a 404 --- I should get a 403.

Thanks,
Justin

BTW Is there a specific URL pattern for proxy requests compared to direct requests? EG?query_string URL...

EDITED: Removed the [NC] flag --- kinda slow sometimes.

jdMorgan

5:09 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Do I need to have a server setting adjusted to allow the proxy?

From Apache mod_proxy Forward and Reverse Proxies [httpd.apache.org]:

A reverse proxy is activated using the ProxyPass directive or the [P] flag to the RewriteRule directive. It is not necessary to turn ProxyRequests on in order to configure a reverse proxy.

Nothing is jumping out at me as being 'wrong' with that code, except that [L] used with [P] is redundant, and that [PT] is intended for use with internal rewrites (i.e. rewrites without "http://www.example.com/" as the substitution URL).

What does your server error log show for each test case?

Jim

jd01

5:18 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't get to the logs until tomorrow.

I strip QUERY_STRING requests (and have some other blocks) on the www. domain, so I wondering if the proxy directly requests the URL in a standard = http://www.example.com/img/file.gif pattern or if it is some 'non-standard' request to allow the server being requested the opportunity to deny the proxy.

Justin

jd01

5:25 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So, could I use:

ProxyRequests Off

ProxyPass /proxy-img http://www.example.com/img
ProxyPassReverse /proxy-img http://www.example.com/img

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^img\.example\.com
RewriteCond %{REQUEST_URI} \.gif¦\.jpg$
RewriteRule (.*) /proxy-img/$1 [L]

To achieve the same effect?

Justin

I think I am saying is, uh, I think what I have (original ruleset) should work according to what I am reading, but maybe I am missing something...

jd01

5:56 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's another take on non-www. to www.

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Justin

Added ?

jdMorgan

5:58 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The original code should have worked with just [P]. The mod_proxy directives aren't allowed in .htaccess.

Need to see those error logs...

[added]

For .htaccess, I'd write it like this:


RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^img\.example\.com
RewriteRule ^([^.]+\.(gif¦jpg))$ http://www.example.com/img/$1 [P]

[/added]

Jim

[edited by: jdMorgan at 6:14 am (utc) on Nov. 9, 2006]

jd01

7:07 am on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How about this for multiple rewrites/directory, with a single comparison/dir-path location?

RewriteRule !^some-dir/dir/ - [S=3]
RewriteRule ^[^/]+/[^/]+/(index\.html)$ http://www.example.com/new-dir/ [R=301,L]
RewriteRule ^[^/]+/[^/]+/somepage\.html$ /dir/dir/page.php [L]
RewriteRule ^[^/]+/[^/]+/([a-z-]+)/otherpage\.html$ /dir/page.php?var=$1 [L]

OR is it faster to use a directory container?

This way I am using 10 prefix comparisons for about 60 rules.
Moves through things pretty quick compared to a 'full match' each.

Justin

ADDED: Could probably use the implicit 'everything up to' in most situations, but there are reasons I need the comparisons in the code, not appearant in the 'examplified' 'catch-all' version above.

EG
RewriteRule !^some-dir/dir/ - [S=3]
RewriteRule /(index\.html)?$ http://www.example.com/new-dir/ [R=301,L]
RewriteRule somepage\.html$ /dir/dir/page.php [L]
RewriteRule ([a-z-]+)/otherpage\.html$ /dir/page.php?var=$1 [L]

Edited too much out... added / & ? to allow for ending in / or /index\.html.

jd01

9:52 pm on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Couldn't get to the logs, but here is the fix...

(Apparently the proxy was not working, because the sub. is resolved to the main domain, so by requesting from the same location on sub.example.com I can achieve the same effect. To keep from duplicating content, I only allow access to the sub on .gif & .jpg files.)

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST}%{REQUEST_URI} !^img\.example\.com/img/[^.]+\.(gif)¦(jpg)
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

Interestingly... when I use the following Cond I return a 500 error:
RewriteCond %{HTTP_HOST}%{REQUEST_URI} !^img\.example\.com/img/[^.]+\.(gif¦jpg)

Not sure why. It appears to be a 'correct' expression, with proper placement of (), but I return an error, whereas using 'poor' () results in the correct location being opened?

Will probably look at it from efficiency POV later.

ADDED: I will probably install some extra checks on my php scripts to make sure nothing 'slips through' to the sub on 'page' requests... May be 'un-necessary' but I think keeping control of what can be done is paramount in 'site protection'.

Justin

jd01

2:10 am on Nov 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Figured I would just post all the way through this solution.

RewriteRule ^img/[^/.]+\.(gif¦jpg)$ - [L]

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]

I decided since there was no real duplicate content issue with graphics the fastest solution would to be allowing /img/ requests to any subdomain... nobody should be requesting them except me, and if they are I don't think it really matters where they request them from.

Any thoughts welcomed...

Justin

Added \ posted with () for grouping. Removed preceding / for .htaccess.

jdMorgan

2:51 am on Nov 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interestingly... when I use the following Cond I return a 500 error:
RewriteCond %{HTTP_HOST}%{REQUEST_URI} !^img\.example\.com/img/[^.]+\.(gif¦jpg)

...

Any thoughts welcomed...

I sure would like to see the error log on this 500-Server Error. As long as you modified the broken pipe character to a solid pipe, it should have worked. I wonder if using an end-anchor would change the result?


RewriteCond %{HTTP_HOST}%{REQUEST_URI} !^img\.example\.com/img/[^.]+\.(gif¦jpg)$

or breaking it up into two lines:

RewriteCond %{HTTP_HOST} !^img\.example\.com
RewriteCond %{REQUEST_URI} !^/[^.]+\.(gif¦jpg)

Without seeing the error log it's pretty much impossible to guess...

Jim

[edited by: jdMorgan at 2:52 am (utc) on Nov. 10, 2006]

jd01

5:44 am on Nov 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, this version worked:

RewriteCond %{HTTP_HOST}%{REQUEST_URI}!^img\.example\.com/img/[^.]+\.(gif¦jpg)$

I moved it above the 'canonical' rewrite and removed the host portion, because it added an extra comparison/condition to my www. rewrite.

I don't need to worry too much about what server is actually requested for images, so I can omit all image requests before the www. check in the .htaccess to keep any comparisons on image URL from happening.

<correction>

()¦() worked, not the one you posted --- Sorry.

</correction>

Justin

I would like to know why the original version didn't work too, but the site with a shared 'hot box' host and I can't get to them without a hassle...

I try to not 'rock the boat' too much with them, because they do a great job, except for a few quirks.

EG It took me a single phone call and five minutes to get wildcard domains turned on --- When I called to talk to them originally I asked about what type of control panel they used... They replied, "We don't, were not that type of host..." --- I signed up =)