Forum Moderators: phranque

Message Too Old, No Replies

canonical url

how to avoid undesired domains resolved

         

specter

4:50 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello,

I come from a Google discussion about canonical fix.
I was not aware that my domain name could be resolved in many different ways:

192.168.123.123/foldername/
quux-foo.com/foldername/
www.quux-foo.com/foldername/
anythingyouwantrandom.quux-foo.com/foldername/
www.anythingyouwantrandom.quux-foo.com/foldername/
foldername.quux-foo.com/
www.foldername.quux-foo.com/
example.com/
www.example.com/ <=== Canonical!
example.com/
www.example.com/

I'm wondering about how to redirect the undesired domains to the canonical one:

What code should I add in my .htaccess file?

Thanks

[edited by: jdMorgan at 7:39 pm (utc) on July 2, 2007]
[edit reason] example.com [/edit]

jdMorgan

7:50 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The easiest way is to simply redirect any request for non-canonical domains and filepaths back to the correct domain:

Options +FollowSymLinks
RewriteEngine on
#
# Redirect to fix "foldername" requests
RewriteRule ^foldername(.*)$ http://www.example.com$1 [R=301,L]
#
# Redirect all non-canonical domain requests to requested resource in canonical domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

The fix for "foldername" may need some adjustment, as it wasn't entirely clear what "foldername" means -- Whether it represents one specific folder or multiple folders which are identifiable in some unspecified manner.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim
[edit] Fixed misspelled "RewriteRule" [/edit]

[edited by: jdMorgan at 6:58 pm (utc) on July 4, 2007]

specter

8:03 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you very much Jd! :)

Any particular reason to prefer "www" instead of "non-www" domain?
Or is it the same?

jdMorgan

8:28 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, that's your choice to make after considering *all* existing links to your site, and how the two variations already rank in search engines.

Basically, if you build a new site and you pick one and only one version of the domain and then use code like this to redirect all others, you will never have this problem again. But if you have an existing site that is already listed under multiple variations of the domain, then you need to take into account how you have linked to the pages in the site, how others have linked to pages in the site, and how search engines have listed the pages in the site.

Jim

jdMorgan

8:29 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And you also need to consider "branding" -- How well-established your site's name may be in people's minds, in print, on radio, etc...

Jim

specter

8:57 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As regard that last point,the site is brand:new, so that's not an issue.

But if you have an existing site that is already listed under multiple variations of the domain, then you need to take into account how you have linked to the pages in the site, how others have linked to pages in the site, and how search engines have listed the pages in the site.

Well, that's just my case.

Links are all at http://www.example.com/ or also http://www.example.com/page.htm.

Google lists my pages as www.example.com/page.htm
but it displays different results (especially in terms of number of indexed pages)depending on the fact that I search for "www" or "non-www" domain.In particular, if I search for "non-www", it returns many more results...
Anyway in both cases urls displayed are "www"...hope I was clear...

What would be your advice?

[edited by: jdMorgan at 11:31 pm (utc) on July 6, 2007]
[edit reason] example.com [/edit]

jdMorgan

9:02 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, then just use the code as-posted above, and redirect all non-www requests to www. This will save you having to change all your existing links...

Jim

specter

9:15 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim.

This is my current .htaccess file:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName www.mydomain.net
AuthUserFile /home/user/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/user/public_html/_vti_pvt/service.grp
AddType application/x-httpd-cgi .htm
options -Indexes

Could I add your code simply putting it above or below it?

Besides, I don't want the foldername fix: how can i remove it from that code?

Please, be patient, I'm totally in the dark in dealing with .htaccess file...

Thanks for a kind support. :)

specter

6:37 pm on Jul 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please, could anyone help me out about that issue?

Thanks :)

jdMorgan

2:31 am on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can put the new code above or below your existing code, it makes no difference. The reason it makes no difference is that .htaccess is not processed like a script, from start to finish. Rather, it is processed several times, once by each Apache module. Each module looks for directives that it understands and can handle.

The order that the modules process the .htaccess file is determined by the reverse LoadModule order on Apache 1.x, anbd by an internal priority scheme on Apache 2.x.

I encourage you to experiment and test. Keep a backup of your old .htaccess file, and if you break something, simply replace your new (broken) .htaccess with the old working backup. This limits any 'damage' to the few seconds that the broken .htaccess is active on your server.

Jim

specter

6:34 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks jim.
I already experimented.
I copied and pasted it either,above and below the existing code, but in both cases it doesn't work.
I only commented out the "foldername" fix.
Shouldn't I?
what could be the problem?

specter

6:16 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




System: The following message was spliced on to this thread from: http://www.webmasterworld.com/apache/3386275.htm [webmasterworld.com] by jdmorgan - 1:51 pm on July 4, 2007 (CDT -5)


Hello,

I should add the following code to my .htaccess file:

Options +FollowSymLinks
RewriteEngine on
#
# Redirect to fix "foldername" requests
RewriteRule ^foldername(.*)$ http://www.example.com$1 [R=301,L]
#
# Redirect all non-canonical domain requests to requested resource in canonical domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.example\.com
RewrireRule (.*) http://www.example.com/$1 [R=301,L]

below is my current .htaccess file:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName www.mydomain.net
AuthUserFile /home/user/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/user/public_html/_vti_pvt/service.grp
AddType application/x-httpd-cgi .htm
options -Indexes

How should I do?
I tried toadd it above and below it but it doesn't work.
I also tried the sole code but it doesn't work,again.

Please,help me.I'm zero about .htaccess issues.

How could I resolve?

Thanks :)

jdMorgan

6:57 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Like this:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

Order deny,allow
<Limit GET POST>
Allow from all
</Limit>

<Limit PUT DELETE>
Deny from all
</Limit>

AuthName www.mydomain.net
AuthUserFile /home/user/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/user/public_html/_vti_pvt/service.grp
AddType application/x-httpd-cgi .htm

Options -Indexes +FollowSymLinks
RewriteEngine on

# Redirect all non-canonical domain requests to requested resource in canonical domain
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim

specter

7:42 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh...ok!

It works fine now!

I didn't understand what really was the error,but the new .htaccess runs fine, redirecting as expected.

Thanks a lot Jim, your help was very very appreciated.

All the best

Sincerely

Dan

specter

10:12 pm on Jul 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello,

I have to come back on the issue:
I discovered that the code above affects also the subdomains:

[sub.example.com...] is resolved as http://www.example.com/sub/

I would keep my subdomains as I spread them for the linking: is there a way to fix this or should I resign to transform my subdomains in subdirectories and re-submit them for the linking?

Hope I was clear...

Thanks

[edited by: jdMorgan at 11:30 pm (utc) on July 6, 2007]
[edit reason] example.com [/edit]

jdMorgan

10:30 pm on Jul 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How many subdomains do you have? The fix for few is different than the fix for many. Also, the fix is different if you plan to keep adding more, than if you just have a few that you are satisfied with...

Using an undefined set of subdomains opens up your site for malicious linking -- people linking to subdomains of your domain that do not exist. So, I suggest that you limit yourself to a short list of pre-defined subdomains, and redirect everything else to the canonical domain. Again, the solution depends on your site's specifics.

Jim

g1smd

10:46 pm on Jul 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> it wasn't entirely clear what "foldername" meant <<

In the original example that this was copied from, foldername was the name of the folder on the main host domain that the add-on domain (a separate site) was being served from. The folder name fix was to allow the content in that folder to be only indexed under the add-on domain name that resolves directly to that folder as a separate site.

The redirect also catches all sub-domains, and sub-sub-domains, etc, and redirects them all to the canonical add-on domain name.

The code went in the root of the add-on domain, which is actually a folder off the main domain; that is, a folder off the main hosting account. The file did not go in the root of the main domain.

That is, the code was located:

# THIS FILE RESIDES AT: 123.123.123.123/foldername/.htaccess
# a.k.a. (www.)mainsite.com/foldername/.htaccess
# a.k.a. (www.)(anythingyouwant.)foldername.mainsite.com/.htaccess

# DOMAINS: (www.)some-site.com and (www.)that-site.com
# are parked and served from the same server and folder as add-on domains.

The site resolves only at www.some-site.com and all other possible URLs for the content serve a 301 redirect to the canonical URL.

specter

11:03 pm on Jul 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok.I have only three subs, that are very important, and I don't plan to add others.
What would be the best way to fix that, guys?

jdMorgan

11:28 pm on Jul 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Then you could prevent the redirect for those subdomains like this:

# Redirect all non-canonical domain requests to requested resource
# in canonical domain except for recognized subdomains
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} !^subdomain1\.example\.com
RewriteCond %{HTTP_HOST} !^subdomain2\.example\.com
RewriteCond %{HTTP_HOST} !^subdomain3\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

or like this:

# Redirect all non-canonical domain requests to requested resource
# in canonical domain except for recognized subdomains
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^(www¦subdomain1¦subdomain2¦subdomain3)\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

However, in this second example, you must replace the broken pipe "¦" characters with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

specter

8:21 am on Jul 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks again jim.
I'll try and I'll let you know ;)

specter

8:32 am on Jul 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well,

it works fine. but it prevents redirect also for unspecified subdomains...

specter

11:17 am on Jul 7, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



oops! forget what I said above:all works fine.

BUT...

I found another bug:

I have a search function onsite, based upon a perl .cgi script, under the sub http://sub.example.com/

When that search displays the results page it resolves at http://www.example.com/sub/ and all my home page links are "cut off" as they point all to http://sub.example.com!

Apparently I fixed that adding to your code the bolded string below:

# Redirect all non-canonical domain requests to requested resource
# in canonical domain except for recognized subdomains
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} !^sub\.example\.com
RewriteCond %{HTTP_HOST} !^www\.sub\.example\.com
RewriteCond %{HTTP_HOST} !^sub2\.example\.com
RewriteCond %{HTTP_HOST} !^sub3\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Now the search resolves at http://www.sub.example.com/ and that's compatible with my links,but I'm not sure that is technically correct...
What's your opinion?

[edited by: jdMorgan at 5:38 pm (utc) on July 9, 2007]
[edit reason] example.com [/edit]

jdMorgan

5:40 pm on Jul 8, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your new code is fine, but I'm not sure why you'd want to use a sub-sub-domain. It's entirely up to you, but it makes for rather long URLs...

Jim

specter

11:10 am on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wouldn't really use a sub-sub-domain;

Without my "fix" ,starting from the correct sub-domain http://sub.example.com/, doing a search, the page of the results resolves at http://www.example.com/sub/.
With my fix it resolves at http://www.sub.example.com/.

Why it resolves at www.sub. instead of sub.,I don't know at all.
All that I know is that in that way it works fine for my links.
I don't think it depends on the .cgi script...

Do you have a better idea?

[edited by: jdMorgan at 5:35 pm (utc) on July 9, 2007]
[edit reason] example.com [/edit]

jdMorgan

5:33 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is likely related to another set of rewrite rules used to map your subdomains to subdirectories. It is likely that those rules are executing before the one we're discussing, and the two acting together "expose" your subdirectory path.

Rule order is important. In general, you'll want to place your most-specific external redirect rules first, then the least-specific external redirects, then the most specific internal rewrite rules, then the least specific.

Used in conjunction with the [L] flag on rewriterules, this helps to prevent redirects exposing internally-rewritten URL-paths.

Jim

specter

6:02 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<<<<<This is likely related to another set of rewrite rules used to map your subdomains to subdirectories. It is likely that those rules are executing before the one we're discussing, and the two acting together "expose" your subdirectory path. >>>>>>>

Well,

didn't I overwrite them by editing the .htaccess file with the new code?
didn't the server follow the new rules?

I'm a bit confused...

jdMorgan

8:58 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may have mod_rewrite rules and mod_alias Redirect and/or RedirectMatch directives in .htaccess, httpd.conf, and conf.d, just to name a few. I cannot answer your question because I don't know "everything about your server" and cannot examine its entire configuration. If you are on shared hosting, then even you cannot examine its entire configuration, as the host will hide httpd.conf and conf.d from you so that you cannot make modifications that would affect other users on the same server.

Jim

g1smd

10:38 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The script that you borrowed from me, and which you placed in the root of your main site, was originally placed in one sub-folder on the site that it originally came from.

It is likely that what it does isn't quite what you need for your situation, and needs modification in several subtle ways.

specter

10:35 am on Jul 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well,

Ok.Then, I stop here. The game becomes too hard for my knowledge.
Now it works fine for my purposes, and that's important finally...

Thank you again guys, for your support till here.

All the best

Sincerely

g1smd

8:18 pm on Jul 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As long as it is working, then that is fine.

Make sure you test it for many different variations of URL that you can throw at it, so that you can confirm that it responds in the correct manner for all of them.