Forum Moderators: phranque

Message Too Old, No Replies

htaccess redirects root to subfolder && avoid duplicate content

a tough one... because both directories use an .htaccess file

         

tens

1:01 am on Aug 19, 2010 (gmt 0)

10+ Year Member



Hello,

i am facing a real issue here... let's say, i have my root domain

www.rootdomain.com

now i want my website to show through this main domain, but to keep things structured, i want all the files to be in a subfolder, e.g : www.rootdomain.com/subfolder.

so you guessed it, i put an .htaccess file in the root directory:

root/
-subfolder/
----images/
----css/
----js/
----index.php
-.htaccess

here is the content of my htaccess:

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^(www.)?mydomain.com$
RewriteRule ^(.*)$ /subfolder/$1 [L]


this actually works perfectly. But now i'm facing another issue : duplicate content. Because my website is now reachable through:

[mydomain.com...]
and
[mydomain.com...]

I've stumbled across a solution posted somewhere among the thousands of posts here, which used basically this code:

RewriteCond %{THE_REQUEST} ^GET\ /subfolder/
RewriteRule ^subfolder/(.*) /$1 [L,R=301]
RewriteRule !^subfolder/ subfolder%{REQUEST_URI} [L]


It works, but only if i just have one htaccess file in the root directory.
As soon as i create an second htaccess file in the subfolder, like this

root/
-subfolder/
----images/
----css/
----js/
----.htaccess
-index.php
-.htaccess

then, the above code doesn't work anymore.
And if you use that code, you cannot access other subfolders. For example, let's say i have a subfolder called "subfolder2", if i use the code above and then try to go to [mydomain.com...] the server redirects to [mydomain.com...]

And i NEED this second htaccess file, because my site uses a lot of friendly URLS, and i just can't make it without an additionnal htaccess file.

So, to sum it all up, what i'm trying to do since 5 hours unsuccessfully is :

- have www.mydomain.com redirect to www.mydomain.com/subfolder
- but have this redirect only when accessing the root domain, not files or folders within, i.e being able to directly reach www.mydomain.com/subfolder2
- Have www.mydomain.com/subolder2 redirect to www.mydomain.com, to avoid duplicate content
(well when i do this, this naturally produces an infinite loop... is it ever possible to make this work ?)
- and i defintely need to keep that second htaccess file within the subfolder... it containes some important rules like:
- force www url
- all my SEF url rules
- force www.mydomain.com/index.php to display as www.mydomain.com/ (so no index.php or index.html.file appear in the url - also duplicate content avoiding too)

so, is there any chance for this setup to work, or am i doomed to have all my website files in the root directory without any change of organizational structure ?

jdMorgan

5:32 am on Aug 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're not paying sufficient attention to details here...

A request for [mydomain.com...] is a request for a *file* in the main directory, and not for the index page of subfolder2, as you appear to assume. The reason for this is the same as the reason that your rule fails... The trailing slash has meaning and is missing. Modify your test expectations, or modify your rule, or specifically redirect subfolder requests which lack trailing slashes to add the slashes -- Your choice on this, I advise the latter.

Then, look into the "RewriteOptions inherit" directive, as that may have something to do with your subfolder/.htaccess appearing to "break" your /.htaccess file...

For each request to your server, all rules in all .htaccess files in the path to the requested resource will be executed. And they will be executed repeatedly, until no more rules match. Therefore, specific exclusions are often needed.

RewriteRule ^subfolder2$ http://www.example.com/subfolder2/ [R=301,L]
#
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ http://www.example.com/$2 [R=301,L]
#
RewriteCond $1 !^(subfolder|subfolder2|subfolder3)(/.*)$
RewriteRule ^(.*)$ subfolder/$1 [L]

There's a bigger problem, though. How is the code that rewrites all requests to /subfolder supposed to know that it should not rewrite some requests (for other subfolders)? You will have to do one of two things -- either list out all of the subfolders that that rule should not rewrite, or test each and every incming HTTP request to see if the initial URL-path-part resolves to an existing directory (subfolder). The former is hard to maintain, while the latter is wasteful of server resources and can be slow -- as well as possibly beating your disk to death...

It's late -- beware of possible typos...

Jim

tens

2:53 pm on Aug 19, 2010 (gmt 0)

10+ Year Member



Hi jdMorgan,

thanks for your instructive answer.
As you mentionned, testing every incoming http request to see if the initial url-path-part resolves to an existing directory would be a server suicide solution.

Since i don't think i will have more than 10 folders, i don't mind maintaining an updated htaccess file, adding a folder name in the rules when necessary.

So i put in the /.htaccess file :

RewriteCond $1 !^(subfolder|subfolder2|subfolder3)(/.*)$
RewriteRule ^(.*)$ subfolder/$1 [L]


This makes the redirect from the root to /subfolder OK, having still direct access to subfolder2|3.

Then, i added:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ http://www.mydomain.com/$2 [R=301,L]


So requests to mydomain.com/subfolder are redirected to mydomain.com/
this is OK as long as i don't have an htaccess file in the /subfolder.

So, as you adviced, i added

RewriteOptions inherit


in the subfolder/.htaccess file.

but now, when i try to access to www.mydomain.com, it wants to go to www.mydomain.com/subfolder/subfolder.

With the current setup, i thought the scheme would be:

1- request goes to mydomain.com/subfolder

mydomain.com/subfolder has an .htaccess (we'll call it the sub htaccess) that takes precedence on the root directory htaccess (let's call it the main htaccess); since the sub .htaccess inherits from the rules of the main htaccess, i.e :

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ http://www.mydomain.com/$2 [R=301,L]


here we are redirected to www.mydomain.com/

2- Then we go back in the main htaccess that has this rule

RewriteCond $1 !^(subfolder|subfolder2|subfolder3)(/.*)$
RewriteRule ^(.*)$ subfolder/$1 [L]


so since the last request was to www.mydomain.com/, then the rewrite rule should apply and redirect all trafic to /subfolder.

----------------------------------

What am i missing here ?

jdMorgan

3:43 pm on Aug 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mydomain.com/subfolder has an .htaccess (we'll call it the sub htaccess) that takes precedence on the root directory htaccess (let's call it the main htaccess);

Subfolder .htaccess files *do not* take precedence over the main .htaccess files. The main .htaccess file will be processed first, and then if the path still points to the subfolder (i.e. it has not been rewritten to point elsewhere by code in the main .htaccess file), the subfolder/.htaccess file will be processed. The subfolder/.htaccess file can "override" or --to some extent-- countermand the /.htaccess file, but it certainly cannot be described as "taking precedence."

And repeating from a previous post:
For each request to your server, all rules in all .htaccess files in the path to the requested resource will be executed. And they will be executed repeatedly, until no more rules match. Therefore, specific exclusions are often needed.


mydomain.com/subfolder has an .htaccess (we'll call it the sub htaccess) that takes precedence on the root directory htaccess (let's call it the main htaccess); since the sub .htaccess inherits from the rules of the main htaccess, i.e :

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ http://www.mydomain.com/$2 [R=301,L]

here we are redirected to www.mydomain.com/

No we are not, because this rule, if located in /subfolder/.htaccess, will only execute if the URL-path
www.mydomain.com/subfolder/subfolder/something
is requested -- two levels of "/subfolder" here...
The URL-path 'seen' by the RewriteRule is stripped of the local URL-path to *this .htaccess file's* directory.

And it's actually good that this rule is broken, because this rule is utterly redundant if RewriteOptions inherit is enabled.

Perhaps that will clear up some of your mysteries...

Jim

tens

5:13 pm on Aug 19, 2010 (gmt 0)

10+ Year Member



Ok i think i get it... i THINK.

so, basically, once the main htaccess has redirected us to the subfolder, now what apply are the rules of the subfolder htaccess, applying at the subfolder level.

so the subfolder htaccess inherits these two rules :


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ http://www.mydomain.com/$2 [R=301,L]

RewriteCond $1 !^(subfolder|subfolder2|subfolder3)(/.*)$
RewriteRule ^(.*)$ subfolder/$1 [L]


since we're at the subfolder level, as you said the first rule isn't applied as i thought it could have.
But the second rule do apply, thus the request ending in www.mydomain.com/subfolder/subfolder.

My guess here is to replace the second rule, in the subfolder htaccess file, so that ^(.*)$ requests within the subfolder do not redirected to subfolder/subfolder/$1.

But i don't what to replace it with ? Or is it possible to prevent some rules from being inherited (like making the second rule not apply to the subfolder htaccess, so we're not redirected to mydomain.com/subfolder/subfolder) ?

jdMorgan

3:26 am on Aug 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first rule redirects requests that come direct from a client containing the /subfolder path, because it is our intent to never let the client 'see' that a subdirectory is in use, and if the subdirectory does somehow gets exposed as a URL, then we want to redirect the client and get rid of that subdirectory path.

The second rules rewrites all requests to /subfolder, unless this has already been done.

I don't see any concerns here.

Your understanding of inheritance may be off a little. All it means is that the top-level .htaccess will get executed for requests to the subfolder as well as for top-level directory requests. However, the top-level .htaccess always executes in the context of the top-level directory... doing it any other way just wouldn't be practical (or useful).

Rules placed in /subfolder/.htaccess will see 'incoming requests' which have had the /subfolder path removed from them already. So a RewriteRule pattern in the /subfolder/.htaccess file matching a URL-path with "^foo/bar\.hmtl$" will invoke that rule if http://example.com/subfolder/foo/bar.html is requested by a client.

However, because you rewrite all requests to "/subfolder" on your site, that changes things a little: The rule will be invoked if the client requests http://example.com/foo/bar.html, because that request will get rewritten to http://example.com/subfolder/foo/bar.html by the rewritrule in /.htaccess, and then the rule in the /subfolder/.htaccess file will be invoked as described above.

Jim

tens

1:00 pm on Aug 20, 2010 (gmt 0)

10+ Year Member



actually there is no problem with www.mydomain.com redirecting to www.mydomain.com/subfolder.

my only concern here is that if someone requests www.mydomain.com/subfolder directly, he gets redirected to [mydomain.com...] (or, in a proper way to formulate this, his address bar has to change from mydomain.com/subfolder to mydomain.com/)

currently :
- if i simply add RewriteOptions inherit in the /subfolder/.htaccess, requests are all redirected to /subfolder/subfolder, no matter what, thus ending to a bad request error (no matter what i type : www.mydomain.com or www.mydomain.com/subfolder. they all end up in mydomain.com/subfolder/subfolder)

so, still in this same /subfolder/.htaccess, i tried removing "RewriteOptions inherit", and then adding only this rule :

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /subfolder(/[^\ ]*)?\ HTTP/
RewriteRule ^subfolder(/(.*))?$ /$2 [R=301,L]


so with this, all direct requests to mydomain.com/subfolder should have the "subfolder" part remove, shouldn't they ?

but if you type www.mydomain.com/subfolder, you can reach the website, and the url stills shows www.mydomain.com/subfolder

/subfolder isn't removed, just as if the rule in the htaccess was simply ignored.
This is driving me nuts.