Welcome to WebmasterWorld Guest from 52.206.226.77

Forum Moderators: Ocean10000 & phranque

How to noindex hosting domain subfolder but not my domain

     
7:16 pm on Feb 4, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


I don't know why, but google has in a few cases picked up one of our websites "www.websitewehost.com" and our internal structure on our hosting "www.ourhostingdomain.com/websitewehost" in the index. Our host will allow us to use either or both once we go live - they both point to the same folder on our server. However we only want "www.websitewehost.com" to be indexed, not "www.ourhostingdomain.com/websitewehost". I can't put a robots file with "Disallow /" in the "www.ourhostingdomain.com/websitewehost" root folder, because it will also be seen when visiting "www.websitewehost.com" and prevent indexing of the whole site (which we want). But as it stands now, google could see this as duplicate content. How can I prevent indexing of "www.ourhostingdomain.com/websitewehost" but allow "www.websitewehost.com" since they contain the same set of files?
7:24 pm on Feb 4, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9251
votes: 785


Welcome to WW @KallenWeb!

In future use example.com as a replacement for any site details.
8:16 pm on Feb 4, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4205
votes: 265


You should probably talk to the host first to find out if your setup is correct. It is not the normal setup from your description, each domain should have its own root folder.
8:43 pm on Feb 4, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


The domains are "add-on" domains to our main www.ourhostingdomain.com hosting plan.

Each domain does have a folder: www.ourhostingdomain.com/public_html/example

It is accessible at www.example.com or at www.ourhostingdomain.com/example

This is how this (very large well known not great reputation) host does it. I think a lot of others as well.

It is very strange the www.ourhostingdomain.com/example "site" would ever get indexed, as there are no links to it posted publicly, nor any on www.ourhostingdomain.com. Only www.example.com should ever be indexed. But now that it is, I need to deal with it.
9:54 pm on Feb 4, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


you need to add a hostname canonicalization redirect to your server configuration.
9:55 pm on Feb 4, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


I think I understand what you're describing. You've got example.com, example.org, example.net and so on--but the host uses a primary/addon structure, so the physical directories are
/blahblah/example.com
/blahblah/example.com/example.org
/blahblah/example.com/example.net
and so on.

In spite of the topic title, this is not actually an indexing question, because it's universally better not to let people reach the same content more than one way. It is possible that Google and other search engines routinely try these formats when they notice that two sites live on the same server, just as they routinely try /directory and /directory/index.html alongside the correct /directory/. So you need to redirect everyone. The exact form of the redirect will, of course, depend on the server type. But essentially any request for
https://example.com/example.org/more-stuff
needs to be redirected to
https://example.org/more-stuff
10:09 pm on Feb 4, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4205
votes: 265


Do you have .htaccess files in both domains? You can add a line for X-Robots in the hosting example.com:
Header set X-Robots-Tag "noindex"
or
Header set X-Robots-Tag "noindex, nofollow"
if you suspect Google could be following something found on the hosting example.com.
10:34 pm on Feb 4, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


We have several websites that are "add-on" domains to our main hosting account. Our main hosting site/account is www.ourhostingdomain.com.

The other websites are independent sites, www.example1.com, www.example2.com, etc.

There is a "backdoor (?) to the www.example1.com website that is found by going to www.ourhostingdomain.com/example1. This is used to access the site before the website is live and has to be kept for whatever reason by the host.

There are no public or private links to www.ourhostingdomain.com/example1 but somehow google has indexed a few of them anyway.

Any files I put into the root folder of www.ourhostingdomain.com/example1 will also be seen by the search engines in the root of www.example1.com. (Thus a hosts noindex/nofollow tag in .htaccess will not work).
11:21 pm on Feb 4, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


Does each domain have its own htaccess? (I should hope so...) If so, there should already be a domain-name-canonicalization redirect. The catch is that you can't use the ordinary

http://www.example.org/blablah >> https://example.org/blahblah

because there's that extra directory (which, incidentally, is a perfectly normal alternative URL that generally relies upon users simply not knowing about it) that has to be stripped away.

Rules if located in “primary” site’s htaccess, one per addon:
RewriteRule ^example.org/(.*) https://example.org/$1 [R=301,L]

Rules if located in individual sites’ htaccess:
RewriteCond %{REQUEST_URI} ^/example.org/
RewriteRule ^(.*) https://example.org/$1 [R=301,L]

<Location> envelopes can't be used in htaccess, so that simply isn't an option.
1:16 am on Feb 5, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


Yes, each site has it's own .htaccess.

In my case, I wonder if it would work, in each site, to put:

RewriteCond %{REQUEST_URI} ^ourhostingdomain.com/example1/
RewriteRule ^(.*) https://example1.org/$1 [R=301,L]


(assumes
https://www.ourhostingdomain.com/example1/
is the site I don't want to have people see or be indexed;
https://www.example1.org
is the site I do want seen/indexed.
2:05 am on Feb 5, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


I wonder if it would work, in each site, to put
The hostname isn't part of the REQUEST_URI; only the path. You have to leave off the “ourhostingdomain.com” element.

The REQUEST_URI in a RewriteCond starts with a / slash, although it doesn’t in the pattern of a RewriteRule (in htaccess or a <Directory> section, that is). They just do it to confuse you.

:: quick detour to test site to make sure I'm not talking through my hat ::

Anyway, I've just realized that all of this is covered by the ordinary domain-name-canonicalization redirect, because the pattern--i.e. the part you'd be capturing--only covers the part of the URL within the current physical directory, regardless of whether you got there via https://example.com/example.org/ or directly via https://example.org.

So if you've already got a canonicalization redirect for each site, the incorrect version shouldn't be happening at all. And if you haven't already got one, there's one more reason you ought to.
4:01 am on Feb 5, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


They just do it to confuse you.

not really

the pattern--i.e. the part you'd be capturing--only covers the part of the URL within the current physical directory

this is why they do it.
4:02 am on Feb 5, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


btw welcome to WebmasterWorld [webmasterworld.com], KallenWeb!
5:16 am on Feb 5, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


In my case, I wonder if it would work, in each site, to put:

RewriteCond %{REQUEST_URI} ^ourhostingdomain.com/example1/
RewriteRule ^(.*) https://example1.org/$1 [R=301,L]


as lucy24 mentioned the general hostname canonicalization redirect should catch this case.
assuming https://www.example.org is the canonical protocol and hostname, you will need something like this:
RewriteCond %{HTTP_HOST} !^(www\.example\.org)?$ [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://www.example.org/$1 [R=301,L]


this ruleset is typically located after the more specific external redirects and before any internal rewrites.
5:07 pm on Feb 5, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


The challenge here, is I'd need to redirect a folder in website 1:
www.ourhostingdomain.com/example1

To website 2:
www.example1.com

but the same .htaccess file will be in the root folder of both websites (because both "websites" point to the same physical files).

So, it has to redirect any site visitors (and search engines) that come to
www.ourhostingdomain.com/example1

But not affect site visitors that come to
www.example1.com
6:41 pm on Feb 5, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


You did say, did you not, that each domain--including the addons--has its own htaccess?

By default, mod_rewrite--unlike other modules--isn't inheritable. This means that the moment you have even a single RewriteRule in an interior directory, any RewriteRule encountered in earlier directories disappears as if it had never existed. (That's Apache 2.2. In 2.4 there are more options and a wider range of defaults, but I can't imagine a host setting a server-wide InheritDown as the default, since it may cause existing configurations to break, aka the dreaded “unintended consequences”.) Don't be misled by the [L] flag; the server will look for RewriteRules in deeper directories. The [L] just means it stops looking at RewriteRules in the present directory.

but the same .htaccess file will be in the root folder of both websites (because both "websites" point to the same physical files).
This doesn't make sense. Sure, requests for example.org and example.com/example.org will both pass through the htaccess for example.com--but it doesn't matter, thanks to mod_rewrite's wonky inheritance rules. Only the rules in example.org's individual htaccess will apply.

Again, a domain-name-canonicalization redirect located in the htaccess of an “addon” domain will have the desired result.
7:12 pm on Feb 5, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


You did say, did you not, that each domain--including the addons--has its own htaccess?

Yes.

It *is* confusing, and I'm trying to describe this understandably.

There are two ways to publicly access the files for website example1.

www.ourhostingdomain.com/example1
www.example1.com


Both of these "paths" access the same files in the same folder on our server.

The first "path" should not be indexed or used (except for internal purposes) and is not posted publicly anywhere, but google has found it and indexed it.

Any .htaccess file I put in the hosting folder for the site will affect all site visitors, whether they come through the first or second web address.

I would need to be able to structure a command in .htaccess that affects only visitors to "www.ourhostingdomain.com/example1" but not cause trouble to visitors to "www.example1.com".

I wonder if this would work per suggestion above? (though not clear what the "or" and following line is for).

RewriteCond %{HTTP_HOST} !^(www.ourhostingdomain.com/example1)?$ [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://www.example1.com/$1 [R=301,L]
7:29 pm on Feb 5, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


Yes.
You keep saying yes, but it sounds as if you really mean no: There is only one htaccess file, located in the "primary" folder. When we ask if each site has its own htaccess, we mean: Is there an additional htaccess file located in the subsidiary folders for each of the "addons"?
11:20 pm on Feb 5, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


KallenWeb, have you tried the mod_rewrite directives i suggested above?

you should familiarize yourself with this section of the apache mod_rewrite doc:
In per-directory context (Directory and .htaccess), the Pattern is matched against only a partial path, for example a request of "/app1/index.html" may result in comparison against "app1/index.html" or "index.html" depending on where the RewriteRule is defined.

The directory path where the rule is defined is stripped from the currently mapped filesystem path before comparison (up to and including a trailing slash). The net result of this per-directory prefix stripping is that rules in this context only match against the portion of the currently mapped filesystem path "below" where the rule is defined.

Directives such as DocumentRoot and Alias, or even the result of previous RewriteRule substitutions, determine the currently mapped filesystem path.


https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
11:26 pm on Feb 5, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


I wonder if this would work per suggestion above? (though not clear what the "or" and following line is for).

RewriteCond %{HTTP_HOST} !^(www.ourhostingdomain.com/example1)?$ [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://www.example1.com/$1 [R=301,L]


no, it won't.
%{HTTP_HOST} will only contain the hostname and not any path information.

the purpose of the [OR] is so that you are redirected to the canonical protocol and hostname if either the hostname is non-canonical or the protocol is non-canonical.
(you want to avoid chained redirects.)
12:11 am on Feb 6, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


I'm trying to be clear about this, I apologize if I am not.

These are the same file:

www.ourhostingdomain.com/example1/.htaccess
www.example1.com/.htaccess


I believe what I need is a command in "www.ourhostingdomain.com/example1/.htaccess" that will point anyone getting the file via "www.ourhostingdomain.com/example1/" to be pointed instead to "www.example1.com" with a 301 redirect - but will not cause a loop for people accessing it via "www.example1.com".

Also, because they are the same file, I'm not sure a mod_rewrite would help. But I may just not be brainy enough to figure it out.

You keep saying yes, but it sounds as if you really mean no: There is only one htaccess file, located in the "primary" folder.


It is hard to explain, apparently. See my note above, that these are the same file:

www.ourhostingdomain.com/example1/.htaccess
www.example1.com/.htaccess


If that means "no" I'm sorry for not answering properly.

Thanks to everyone trying to help me here!
1:25 am on Feb 6, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


I believe what I need is a command in "www.ourhostingdomain.com/example1/.htaccess" that will point anyone getting the file via "www.ourhostingdomain.com/example1/" to be pointed instead to "www.example1.com" with a 301 redirect - but will not cause a loop for people accessing it via "www.example1.com".

Also, because they are the same file, I'm not sure a mod_rewrite would help. But I may just not be brainy enough to figure it out.

KallenWeb, have you tried the mod_rewrite directives i suggested above?
1:34 am on Feb 6, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


KallenWeb, have you tried the mod_rewrite directives i suggested above?


I don't believe I have the brains to figure out a mod_rewrite. <sigh>
1:34 am on Feb 6, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


These are the same file:
They can't be. htaccess is dependent on physical directory, and you are describing two different physical directories. One is the primary domain, the one you call
ourhostingdomain.com
The other is the addon domain example.com, whose files happen to be located at
ourhostingdomain.com/example.com

ourhostingdomain.com/.htaccess
is a different physical location from
ourhostingdomain.com/example.com/.htaccess

Where is/are your htaccess file(s) physically located?
1:36 am on Feb 6, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


I'm trying to describe a website, "www.example1.com" who's files are located in a folder of "www.ourhostingdomain.com" which for the sake of this discussion is "www.ourhostingdomain.com/example1".

The .htaccess file for both "www.ourhostingdomain.com/example1" and "www.example1.com" are located in "www.ourhostingdomain.com/public_html/example1" (I'm 99% sure).
3:03 am on Feb 6, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


Edit: I've just remembered that "example.com" can be used with any tld you can think of, so we can call them "example.primary" and "example.addon".

What do you mean 99% sure? What physical directory do you go to when you've made changes and need to ftp-or-equivalent to the directory?

Let me try this in different words.

When a user requests example.primary/dir/subdir/subsubdir/ --assuming that this made-up URL corresponds to a nest of physical directories-- the server first checks every one of those directories for an htaccess file. It looks for
/beginning-of-filepath/example.primary/.htaccess
and then it looks for
/beginning-of-filepath/example.primary/dir/.htaccess
and then it looks for
/beginning-of-filepath/example.primary/dir/subdir/.htaccess
and then it looks for
/beginning-of-filepath/example.primary/dir/subdir/subsubdir/.htaccess

Most of the time, it will not meet any further htaccess files after that first one. BUT if /dir/ happens to be /example.addon/ (one of the addon domains) then it will indeed meet a second htaccess. At this point, any RewriteRules that it met in the first htaccess will disappear as if they had never existed.

Now pay close attention. Since htaccess is about physical directories rather than URLs, the server will pass through this same sequence of htaccess files regardless of whether the user asked for
https://example.primary/example.addon/et cetera
or
https://example.addon/et cetera
Either way, the server will eventually arrive at the htaccess file that is physically located at
/beginning-of-filepath/example.primary/example.addon/.htaccess

With me so far?

Now, here is where the domain-name-canonicalization redirect kicks in. Eventually, the server meets a RewriteCond that says, among other things,
%{HTTP_HOST} !example\.addon
That "HTTP_HOST" element refers to the hostname ("domain", for most purposes) that the user actually requested.
If they started out requesting
https://example.primary/example.addon/blahblah
then the HTTP_HOST is "example.primary", i.e. it is NOT "example.addon", so the request will be redirected.
If they started out requesting
https://example.addon/blahblah
then the HTTP_HOST is "example.addon" and the request will be handled as-is, not redirected.

Go back and re-read all of that. At some point did you get lost?

[edited by: lucy24 at 3:14 am (utc) on Feb 6, 2019]

3:12 am on Feb 6, 2019 (gmt 0)

New User

Top Contributors Of The Month

joined:Feb 4, 2019
posts: 15
votes: 1


:) yes.

I can google how to do different things in a .htaccess file but this seems too much for me.

Would you consider creating a specific redirect example for me that would work in my situation?
3:30 am on Feb 6, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15506
votes: 751


First, go back and reread, because I was editing and we overlapped.

Each of your addon domains should have a domain-name-canonicalization redirect. phranque gave the standard form a few posts back:
RewriteCond %{HTTP_HOST} !^example\.addon$ [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://example.addon/$1 [R=301,L]
(This is the form for an https site. If it is http, remove the OR flag and the second Condition.)

This means: If the requested host is anything other than "example.addon" (give your exact preferred form, whether it is with or without www), and/or if the request is not https, then redirect the request to https://example.addon. (Tangent: It is customary to make the hostname optional, as in !^(example\.addon)?$ but in shared hosting those three extra bytes aren't really needed, since a request with no hostname will never reach your htaccess anyway.)

In the RewriteRule the part in parentheses (.*) is the URL path, whatever it happens to be: everything after example.addon/ This gets included in the redirect as $1.

With this rule,
http://www.example.addon/more-stuff-here
gets redirected to
https://example.addon/more-stuff-here
BUT ALSO
https://example.primary/example.addon/more-stuff-here
gets redirected to
https://example.addon/more-stuff-here


The tricky part is making the canonicalization redirect for example.primary, which you haven't asked about, because then you need a set of extra conditions to exclude the addons:
RewriteCond %{HTTP_HOST} !(example1\.addon|example2\.addon|example3\.addon)
RewriteCond %{HTTP_HOST} !^example\.primary$ [NC,OR]
RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://example.primary/$1 [R=301,L]
(There are other ways to achieve the same result, but this is probably the easiest.)

In my canonicalization redirects I like to add a preliminary condition
RewriteCond %{REQUEST_URI} !robots\.txt
but that's a matter for a different thread.

[edited by: phranque at 10:45 pm (utc) on Feb 6, 2019]
[edit reason] edit typos [/edit]

7:34 am on Feb 6, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11612
votes: 195


Would you consider creating a specific redirect example for me that would work in my situation?

see my previous post.
8:47 am on Feb 6, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9251
votes: 785


Let's keep it simple?

The other "smart" thing is to get rid of the sub domains (er ... sub folders) and put it all where it should be in the first place.

Do it NOW, do it without redirects---canonicalize all and then WAIT the three days to five months for g to "get it" and stop all the silly stuff. Sometimes fixing bad choices just needs tough love.
This 34 message thread spans 2 pages: 34