homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Problem with URLs of addon domains in Google?
virtualreality




msg:4645291
 4:43 pm on Feb 14, 2014 (gmt 0)

Hello,

My host lists all my additions sites as sub-domains of my primary site. for example my main site is main.com and my second site is second.com but I can access the second site with this url also: second.main.com. Can Google see this as a another domain name which is a duplicate of second.com? And if so what should I do to prevent any issues?

 

aristotle




msg:4645327
 6:36 pm on Feb 14, 2014 (gmt 0)

I've had some add-on sites for several years and have never had a problem. The server is setup to hide the subdomain configuration so that it can't be seen from the outside. The only clue is that all the sites have the same IP address.

virtualreality




msg:4645328
 6:37 pm on Feb 14, 2014 (gmt 0)

thank you

phranque




msg:4645460
 1:16 am on Feb 15, 2014 (gmt 0)

Your server configuration should contain a hostname canonicalisation redirect which would 301 any non-canonical hostname requests to the canonical hostname.

lucy24




msg:4645558
 1:44 pm on Feb 15, 2014 (gmt 0)

My host lists all my additions sites as sub-domains of my primary site. for example my main site is main.com and my second site is second.com but I can access the second site with this url also: second.main.com.

This is a little bit mystifying. It would make sense the other way around: since subdomains most typically live in physical directories within the main site's directory, you could get to
subdomain.example.com
by requesting
www.example.com/subdomain/
unless the site has explicitly coded a canonicalization redirect to prevent it from happening.

But why would a host gratuitously add wildcard subdomains? Seems like it would just be extra work.

aristotle




msg:4645574
 3:46 pm on Feb 15, 2014 (gmt 0)

Lucy- I don't know the details of how it works, but hosting companies started offering it as a way for people like me with shared hosting accounts to host several sites within the same basic account at the same price as one site, as long as the total memory space and bandwidth aren't exceeded. It seems to me that it would have been simpler to allow the domains to be set up independently, thus bypassing subdomains altogether, but for whatever reasons that's the way they chose to do it.

levo




msg:4645578
 4:18 pm on Feb 15, 2014 (gmt 0)

If you can access it, Google can too. I use

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

to redirect all other hostnames (including non-www).

aristotle




msg:4645585
 4:46 pm on Feb 15, 2014 (gmt 0)

levo- Each site (domain or subdomain) has its own .htaccess file. I think the hosting company must configure the server to do the proper re-directs and/or fetch each page from the proper folder.

phranque




msg:4645589
 5:13 pm on Feb 15, 2014 (gmt 0)

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

HTTP/1.0 user agents, which include some crawlers, don't send a HTTP_HOST request header, so to be compatible with those:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

levo




msg:4645593
 5:53 pm on Feb 15, 2014 (gmt 0)

HTTP/1.0 user agents, which include some crawlers, don't send a HTTP_HOST request header


On a shared hosting, a request without the HTTP_HOST header would get the apache default page.

lucy24




msg:4645628
 10:43 pm on Feb 15, 2014 (gmt 0)

Huh? I get 1.0 requests all the time.

hosting companies started offering it as a way for people like me with shared hosting accounts to host several sites within the same basic account at the same price as one site

You misunderstood. My point was that the multiple domains in and of themselves are perfectly understandable. The odd part was that the host has enabled subdomains by default, even though this requires one more step at the DNS level.

levo




msg:4645635
 11:53 pm on Feb 15, 2014 (gmt 0)

HTTP_HOST is mandatory for 1.1. 1.0 requests may also include it. Without HTTP_HOST the user/crawler can't access the non-default virtual host.

phranque




msg:4645647
 1:32 am on Feb 16, 2014 (gmt 0)

That is a not a standard configuration as far as I've seen.
Since your host already diverts that traffic it wouldn't hurt to keep the more general condition in that ruleset.
Some day when you move to a host that passes all requests your code will still work properly.

aristotle




msg:4645780
 3:10 pm on Feb 16, 2014 (gmt 0)

The way the hosting companies set this up has always seemed odd to me, because in the internal file system each "add-on domain" is a subdomain of the "primary domain" that the account was originally created for, but in responding to normal outside requests the server treats each one as an independent domain, and doesn't reveal the subdomain structure. I'm not even sure if there are any "hidden" re-directs in the server configuration, or if it might be possible to do some re-directing yourself, or how it could be done.

I think it would have been simpler to avoid using subdomains altogether, and just give each domain its own independent folder. But that's not how they do it.

lucy24




msg:4645798
 4:10 pm on Feb 16, 2014 (gmt 0)

There are two basic patterns, depending on host.

First way: you've got one directory, which is your domain. If you get additional domains, they're directories within the main one. This happens to be also the most common physical setup for subdomains. But you can't really put "subdomain" and "internal file system" into the same sentence, because servers don't have four dimensions. Conversely, you could have a subdomain living on an entirely different server.

Second way: each person has a "userspace", represented by one directory somewhere in the filesystem. Within that userspace are parallel directories for one or more domains. This means, among other things, that there can be an outer htaccess covering all domains, and then individual htaccess for single domains. htaccess is governed strictly by physical filepaths. A request either does or does not pass through a given directory; there are no side doors. You'd know if requests for one site were passing through a different site's htaccess.

aristotle




msg:4645809
 5:04 pm on Feb 16, 2014 (gmt 0)

First way: you've got one directory, which is your domain. If you get additional domains, they're directories within the main one. This happens to be also the most common physical setup for subdomains. But you can't really put "subdomain" and "internal file system" into the same sentence, because servers don't have four dimensions. Conversely, you could have a subdomain living on an entirely different server.

Thanks for the reply Lucy. I'm sure you're right about the distinction between subdomain and subdirectory. However, the hosting companies that I've used still call it a "subdomain" anyway.
Second way: each person has a "userspace", represented by one directory somewhere in the filesystem. Within that userspace are parallel directories for one or more domains. This means, among other things, that there can be an outer htaccess covering all domains, and then individual htaccess for single domains. htaccess is governed strictly by physical filepaths. A request either does or does not pass through a given directory; there are no side doors. You'd know if requests for one site were passing through a different site's htaccess.

This seems simpler and more logical to me, but it's not how they do it.

Another complication with your first method is that the server has to be configured to fetch files from the right directory or subdirctory; i.e. the one that contains the files for the domain name in the request. This extra complication is another reason why I think the second method would be better.

lucy24




msg:4645900
 1:10 am on Feb 17, 2014 (gmt 0)

but it's not how they do it.

Well, it's how mine does it :-P At first I had no idea what people were talking about when they referred to "add-on domains".

the server has to be configured to fetch files from the right directory or subdirctory

And then it has to be configured twice to point to both example.org and subdomain.example.org for the same physical location.

In fact, wouldn't you end up with three possible accesses? If example.org is physically located inside example.com, you'd expect to be able to type www.example.com/org/ unless the host has taken explicit action to prevent this.

aristotle




msg:4646041
 1:17 pm on Feb 17, 2014 (gmt 0)

Lucy - I know that it's a dumb way to do it, and that it has potential loopholes. But both of my current hosting companies do it that way, and a third company that I used to use also did it that way.

JD_Toims




msg:4646106
 5:17 pm on Feb 17, 2014 (gmt 0)

My host lists all my additions sites as sub-domains of my primary site. for example my main site is main.com and my second site is second.com but I can access the second site with this url also: second.main.com.

Yup, same here on more than one host -- I would (and do) use an essentially-the-same ruleset as phranque's example:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

Works well with all HTTP variation crawlers.

aristotle




msg:4646117
 6:21 pm on Feb 17, 2014 (gmt 0)

I would (and do) use an essentially-the-same ruleset as phranque's example:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

Could you provide more info about this. For example, which .htaccess file do you put that code in? Also, does it affect the different pathways that can be used to reach the add-on domains' folders?

lucy24




msg:4646218
 4:17 am on Feb 18, 2014 (gmt 0)

Which .htaccess file do you put that code in?

This depends to some extent on your host. The subsidiary sites can have independent htaccess files, but the primary (first) domain's htaccess will be shared by all the addons. That's where physical directory structure becomes important. A request can't leapfrog directly to its destination; it has to run the gauntlet of all the directories it passes through.

So you have to put the htaccess file in a location where it will be seen by any requests you want to intercept.

There's a temptation to throw in the towel by having the main site's htaccess start with a line that says in effect

HTTP_HOST !example\.com (no anchors, just enough to exclude the primary site)
RewriteRule . - [L]

and then give each of those subsidiary domains its own htaccess. But it's really a matter of personal coding style. Uhm. Well, you know what I mean.

aristotle




msg:4646319
 11:58 am on Feb 18, 2014 (gmt 0)

Thanks Lucy
Then I might have done it wrong because I've always just given each site (the primary plus the add-ons) its own independent .htaccess file without regard to any possible interactions. When I've tested them in the past, they always seemed to work as they should, but they're all pretty simple. I guess I need to look into this matter more thoroughly, but I've always been wary of making changes to .htaccess files

lucy24




msg:4646503
 12:55 am on Feb 19, 2014 (gmt 0)

Does the primary site have a domain-name canonicalization redirect? Do any of its file or directory names also occur in any subsidiary sites? Those are the two main potential issues.

aristotle




msg:4646663
 1:18 pm on Feb 19, 2014 (gmt 0)

Well there's a lot of redundancy because I mostly use the same basic code in all of my .htaccess files.

For example, in one of my accounts, the .htaccess file for the primary site (call it primary.com) is:
. . . . . . . . . . . . . . . . . . . . . . . .

Options +FollowSymLinks

<Files .htaccess>
order allow,deny
deny from all
</Files>

order allow,deny
deny from 195.240.38.200
deny from 5.63.145.68
deny from 198.105.219.58
deny from 5.10.83.
allow from all

RewriteEngine On
RewriteCond %{HTTP_REFERER} \.(ru|ua|cn|pl|ro)(/|$) [NC]
RewriteRule .* - [F]
ErrorDocument 403 "Access Denied"

RewriteCond %{HTTP_HOST} ^primary\.com$ [NC]
RewriteRule ^(.*)$ http://www.primary.com/$1 [R=301,L]

ErrorDocument 404 /custom404.html

. . . . . . . . . . . . . . . . . . . . . . . .


And for one of the add-on sites (add-on.com) in the same account:
. . . . . . . . . . . . . . . . . . . . . . . .

Options +FollowSymLinks

<Files .htaccess>
order allow,deny
deny from all
</Files>

RewriteEngine on

order allow,deny
deny from 195.240.38.200
deny from 5.63.145.68
deny from 5.10.83.
deny from 75.7.214.
allow from all

RewriteCond %{HTTP_REFERER} \.(ru|ua|cn|pl|ro)(/|$) [NC]
RewriteRule .* - [F]
ErrorDocument 403 "Access Denied"

RewriteEngine on
RewriteCond %{HTTP_HOST} ^add-on\.com$ [NC]
RewriteRule ^(.*)$ http://www.add-on.com/$1 [R=301,L]

ErrorDocument 404 /custom404.html

. . . . . . . . . . . . . . . . . . . . . . . .

So there's a lot of redundancy, but I don't know what effect, if any, it has, because as I said before, everything seems to work as it should.

Another thing that I already mentioned a couple of times, and which is still unclear to me, is the possibility that the server is somehow configured to go straight to the subdirectory for each add-on site, without going through the folder for the primary site at all. Is this possible? If so, it would explain why everything seems to work so well.

lucy24




msg:4646804
 9:54 pm on Feb 19, 2014 (gmt 0)

Why do you have the same Allow/Deny directives in both the primary and the addon? They're inherited, unlike RewriteRules. On my hosting, which has the "userspace" setup, I put all of this in the shared htaccess where it's seen by all domains but is specific to none of them.

There is no way to bypass a physical folder. But if you don't say
RewriteOptions inherit
in the deeper htaccess files, then the results of any earlier RewriteRules are discarded as if they had never existed. Up to and including flat-out 403s. This is in the Apache docs but I've also experimented. That's probably why your outer and inner folders seem to function independently; you're seeing the special behavior of mod_rewrite.

ErrorDocument 403 "Access Denied"

Webmasters tend to forget what a 403 message is. Robots don't care what you say-- but humans will often hit a 403 by accident if, for example, you've got deeply nested folders and not all of them have an index page. I think I've said elsewhere that for many years I didn't even know I was seeing a 403 document; I saw it as "404=no file, 403=no directory index". Humans just don't think about Ukrainians and scrapers and unwanted robots.

If you use the same basic htaccess everywhere, it may help to standardize the layout. Put all the one-liners like ErrorDocument directives at the top. Put any lists, like "Deny from" lines, in numerical or alphabetical order. Doesn't matter if you're only locking out six IPs, but when you've got several screens' worth...

aristotle




msg:4646841
 11:58 pm on Feb 19, 2014 (gmt 0)

Lucy -- Thanks for taking time to look at my code and also giving a critique.

There is no way to bypass a physical folder. But if you don't say
RewriteOptions inherit
in the deeper htaccess files, then the results of any earlier RewriteRules are discarded as if they had never existed. Up to and including flat-out 403s. This is in the Apache docs but I've also experimented. That's probably why your outer and inner folders seem to function independently; you're seeing the special behavior of mod_rewrite.

Well actually, I prefer that the different .htaccess files function independently, or seem to, as long as no problems arise. As for inheritance, couldn't there be cases where you don't want something to be inherited? Anyway, I might move these sites to different servers someday, and I wouldn't have to take time to revise them if they're can already stand alone.

There is no way to bypass a physical folder

There's something about this process that I still don't understand: If the server gets a outside fetch request for "add-on.com/", how does the server know that the requested file is located in a subfolder of primary.com's folder? In other words, since the fetch request doesn't mention primary.com at all, how does the server know to go to that site's folder? It seems to me that it has to be configured to do that at a higher level.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved