Forum Moderators: phranque

Message Too Old, No Replies

.htaccess rewrite multi domain problem

problems with nested subfolders with html pages

         

asenk

7:46 am on Feb 9, 2010 (gmt 0)

10+ Year Member



Hi,
I'm having problems getting the rewrite working for multi domains on one host for a particular situation.

I have a site setup like so.

/site1/ (some .html files here)

/site2/ (some .html files here)
/site2/info/ (some .html files here)
/site2/otherinfo/ (some .html files here)

www.site1.com successfully goes to files in /site1/
www.site2.com does the same for /site2/ folder.

The problem is if I do www.site2.com/info/a.html , i would expect it to actually goto www.site2.com/site2/info/a.html, but it doesn't. It seems it doesnt meet the rewritecond to do the rewriterule. It actually tries to go to www.site2/info/which doesn't exists.


Here is my rewrite rules:



#site 1
RewriteCond %{HTTP_HOST} ^site1\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.site1\.com$
RewriteCond %{REQUEST_URI} !^/site1/
RewriteCond %{SCRIPT_FILENAME} \.html$
RewriteRule (.*) /site1/$1

#site2
RewriteCond %{HTTP_HOST} ^site2\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.site2\.com$
RewriteCond %{REQUEST_URI} !^/site2/.*$
RewriteCond %{SCRIPT_FILENAME} \.html$
RewriteRule ^(.*)$ /site2/$1 [L]


Is there something else I need to add or something I do not understand?

Thanks.

jdMorgan

2:41 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code's a bit sloppy, but it should work. Did you delete your browser cache before testing?

I'd suggest:

# Redirect non-canonical hostname requests to canonical "www" hostnames
RewriteCond %{HTTP_HOST} !^(www\.[^.]+\.com)?$
RewriteCond %{HTTP_HOST} ^(www\.)?(site1|site2)\.com
RewriteRule ^(.*)$ http://www.%2.com/$1 [R=301,L]
#
# Internally rewrite site1 host requests to /site1 subdirectory
RewriteCond %{HTTP_HOST} ^www\.site1\.com$
RewriteCond %{REQUEST_URI} !^/site1/
RewriteRule ^(.+\.html)$ /site1/$1 [L]
#
# Internally rewrite site2 host requests to /site2 subdirectory
RewriteCond %{HTTP_HOST} ^www\.site2\.com$
RewriteCond %{REQUEST_URI} !^/site2/
RewriteRule ^(.+\.html)$ /site2/$1 [L]

I assumed that you use and prefer the "www" subdomain for both sites. If this is not the case, then the first rule will need to be modified. But take care of that problem first, so as to avoid duplicate-content problems in search ranking, and the necessity to check for hostname variants in every rule you write.

You should also canonicalize your URL-paths. For example, precede all of these rules (including the new one) with a rule to redirect client requests for "/index.xyx" back to "/". This will require testing the variable "THE_REQUEST" to differentiate between client and internal requests for "/index.xyz" and avoid looping. This subject is discussed in many previous threads here.

If you do not use them and have not already done so, disable MultiViews using "Options -MultiViews", and disable AcceptPathInfo ("AcceptPathInfo Off" on Apache 2.x and above) as these can cause problems with the server "ignoring" rewriterules if left enabled.

Note that if you intend to 'scale' this approach to more than two sites, a better approach is to rewrite www.site1.com/abc.xyz requests to /sites/site1/abc.xyx and www.site2.com/abc.xyz requests to /sites/site2/abc.xyx. You can then do all hostname-to-subdirectory mapping with a single rewriterule, instead of having to add another rule-set for each hostname that you add. Example:

RewriteCond %{REQUEST_URI} !^/[b]sites[/b]/
RewriteCond %{HTTP_HOST} ^www\.([^.]+)\.com$
RewriteRule ^(.+\.html)$ /[b]sites[/b]/%1/$1 [L]

Jim

[ edit ] Corrected as noted below. [ /edit ]

[edited by: jdMorgan at 6:11 pm (utc) on Mar 1, 2010]

colinceo

5:42 pm on Mar 1, 2010 (gmt 0)

10+ Year Member



I'm trying to apply the
"# Redirect non-canonical hostname requests to canonical hostnames "
part to my site. The code works perfectly for my server running Apache on Linux but it does not work on my other server running Apache on Windows.
BTW I have the code in my root .htaccess file.

What is happening on both, and it's good:
mysite.com/myfile.htm
redirects properly to
www.mysite.com/myfile.htm

but here is the problem on my Windows server only:
mysite.com/somedirectory/myfile.htm
does NOT redirect to
www.mysite.com/somedirectory/myfile.htm

Is there anything specific needed for this to work in a Apache and Windows environment.

jdMorgan

6:08 pm on Mar 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, not normally... as long as it's a good server install.

I'm actually surprised to hear that it worked on Linux because unfortunately, after reviewing my previous post I found that there was an error in the code: Back-references cannot refer to negative-pattern matches. I have corrected the code above to prevent more people from copying the error.

Jim

colinceo

5:27 pm on Mar 2, 2010 (gmt 0)

10+ Year Member



I guess the server install isn't what it should be then.
I'm going to have to do something a little more creative using server side programming, maybe an include with a header redirect. Ouch.

jdMorgan

6:41 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have a .htaccess file in the subdirectory --"/somedirectory" in your example above-- make sure that "RewriteOptions Inherit" is set in that file.

Jim

colinceo

10:04 pm on Mar 3, 2010 (gmt 0)

10+ Year Member



I found the culprit. It's not the code, it's conflicting htaccess files. I had one that I did not know about in a subdirectory. Thanks jdMorgan!

so this works great now:

###############################
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.mywebsiteurl.com$ [NC]
RewriteRule ^(.*)$ [mywebsiteurl.com...] [R=301]
###############################

I'm going to try adding:

RewriteOptions Inherit

to my htaccess file that is in a "/html" subdirectory that currently looks like this.


###############################
RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{QUERY_STRING} ^(.*)$ [NC]
RewriteRule ^(.*)/.*_[pP]([1-9]\d*)\.cfm$ html/$1/index.cfm?productID=$2&%1 [L]
RewriteRule ^(.*)/.*_[pP]([1-9]\d*).cfm$ html/$1/index.cfm?productID=$2
RewriteRule ^(.*)/.*_[pP]([1-9]\d*)_print.cfm$ html/$1/print.cfm?productID=$2
###############################

g1smd

10:56 pm on Mar 3, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For the redirects, remove the [NC] flag from the condition, and add the [L] flag to the rule.

For the rewrites, each one needs the [L] flag.

The second rewrite can never be actioned, because the first one will always match the request instead. The (.*) pattern allows the query string to be blank.

The [QSA] flag is a much better way to re-append the original query string.

For all three rewrites, the (.*)/.* pattern is BRUTALLY inefficient needing hundreds of 'back off and retry' trial matches to find the correct pattern. Replace each .* with some sort of negative match pattern like ([^/]+) and ([^/_]+) that can be parsed from left to right in one cycle.


All 'variable' parts of the URL request need to be passed to the script for the value to be checked. To not do so will cause multiple duplicate content issues. Allowing 'free text' (like the .* part of the URL) or multiple casing (like the [Pp] example) in URL requests can cause a variety of problems (all of them bad). The non-canonical URL requests should be redirected. One and only one canonical URL request format should be able to trigger the rewrite.

colinceo

10:27 pm on Mar 16, 2010 (gmt 0)

10+ Year Member



Thanks for the tips. I've tweaked my code some and tested it to work much better. I can't get rid of the (.*) matches though because this needs to work for any file name in any directory and we can't control what our clients name their files.

If anybody wants to tweak my file, feel free.

This .htaccess file lives in the root of my site:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /


#FORCE WWW

RewriteCond %{HTTP_HOST} !^www.example.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

#PRODUCT: Rewrites for product links where: "/html/products/roboraptor_p600.cfm" actually is "/html/products/index.cfm?productID=600"

RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)/.*_[pP]([0-9]\d*)\_print.cfm$ $1/print.cfm?productID=$2&%1 [L]

#PRINABLE PRODUCT: Rewrites for product printable links where: "/html/products/roboraptor_p600_print.cfm" actually is "/html/products/print.cfm?productID=600"

RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)/.*_[pP]([0-9]\d*)\.cfm$ $1/index.cfm?productID=$2&%1 [L]

[edited by: jdMorgan at 11:01 pm (utc) on Mar 16, 2010]
[edit reason] example.com [/edit]

g1smd

11:24 pm on Mar 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



we can't control what our clients name their files


You can, and you should, within the limits of what is actually valid as defined in the HTTP specifications.

Once you know those limitations the pattern is simple to define.

If you really can't be bothered, then at least ([^/]+) and ([^_]+) will be more efficient than .* could ever be.

Your patterns have optional characters and as those are not checked by the script for validity they create duplicate content issues.

Likewise the variable casing creates another duplicate content issue. You should instead redirect one version to the other and only rewrite for one of the URL versions.

SEO tip: underscores between words are not treated as word separators, use hyphens instead.

jdMorgan

11:40 pm on Mar 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A few tweaks, correcting canonicalization rule to avoid infinite looping w/HTTP/1.0, fixing some duplicate-content problems, putting the comments with the right rules, and removing unnecessary query string manipulation (replaced with [QSA] flag).

Options +FollowSymlinks
RewriteEngine on
RewriteBase /
#
# Externally redirect to correct mis-cased product links to avoid duplicate content issues
RewriteCond %{REQUEST_URI} !^products/([^_]+_)+p[0-9]+\.cfm$
RewriteRule ^products/([^_]+_)+p([0-9]+)\.cfm$ http://www.example.com/products/$1p$2.cfm [NC,R=301,L]
#
# Externally redirect to correct mis-cased printable product links to avoid duplicate content issues
RewriteCond %{REQUEST_URI} !^products/([^_]+_)+p[0-9]+\_print.cfm$
RewriteRule ^products/([^_]+_)+p([0-9]+)_print\.cfm$ http://www.example.com/products/$1p$2_print.cfm [NC,R=301,L]
#
# Externally redirect non-blank non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# PRODUCT: Rewrite product link URLs such as "/products/roboraptor_p600.cfm"
# to file "/products/index.cfm?productID=600"
RewriteRule ^products/([^_]+_)+p([0-9]+)\.cfm$ /$1/index.cfm?productID=$2 [QSA,L]
#
# PRINTABLE PRODUCT: Rewrite product printable link URLS such as "/products/roboraptor_p600_print.cfm"
# to file "/products/print.cfm?productID=600"
RewriteRule ^products/([^_]+_)+p([0-9]+)_print\.cfm$ /$1/print.cfm?productID=$2 [QSA,L]

Your .cfm scripts must also check the "roboraptor" part of the requested URL against the "correct string" in your database in order to prevent both duplicate content problems and malicious linking exploits.

I have omitted the "/html" path-part from this code, as I sincerely hope that it is not visible in the linked URLs... If it is, then that path-part will need to be added back in to both the patterns and to the substitution paths. This would likely indicate a problem with the DocumentRoot defined on this server.

These rules are now much more specific and have improved regex patterns. Both increase efficiency. You may actually notice a speed-up accessing your site.

Jim