Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Questions of google bot crawling on two domains

         

Kyle0

3:09 am on Jun 12, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



I have one main domain and one subdomain hosted on the same server,


main domain: https://mysite1.com/blog/
subdomain: https://mysite2.com/


They are independent sites with different contents.
Their physical location on server is at:

main domain: /public_html/blog/
subdomain: /public_html/mysite2/

The robots.txt of main domain locates under root directory /public_html/robots.txt,

User-Agent: *

Sitemap: https://mysite1.com/blog/sitemap_index.xml


I also have a separate robots.txt under root directory of mysite2 (/public_html/mysite2/robots.txt)

User-Agent: *

Sitemap: https://mysite2.com/sitemap_index.xml


if google bot crawls my main domain, it will also crawls /mysite2/ directory, right?
Should I block bot's access to /mysite2/ in my main domain's robots.txt/, e.g, add this line to
main domain's robots.txt. So, google bot won't crawl /mysite2/ twice?

Disallow: /mysite2/


Or google bot only crawls my site by sitemap_index.xml defined in robots.txt?
In this case, I don't have to do anything?

Recently, I got a google warning of self-referrals in web traffic. I am not sure if this is the reason.
I already added referral exclusion to google analytics.

Dimitri

8:45 am on Jun 12, 2022 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hi,

Peace first, then ...

I have one main domain and one subdomain hosted on the same server,

This is not the definition of a subdomain. Here, you have two different domains / sites.

Googlebot, or anything else, including humans, do not mind how the files/folders are organized at your server, what they see is what your web server (Apache, Nginx, etc...) is sending. This is the web server software, which has different configurations, for each of the domain and sub domain.

Apparently, your sites are already up and running, so, everything seems to be already configured as it should.

In your web sever software (Apache, Nginx, etc...), you have a configuration file(s). For each domain / site, there are directives, which are telling the web server, where to find the documents to serve.

As for the sitemap , it has only very limited impact on Googlebot's behavior . It just tells the bot, of URLs, but it will crawl anything it will also find (following links).

Kyle0

8:38 pm on Jun 12, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hi Dimitri

Thanks for reply. You are right. They are two sites hosted on same server (VPS), and they have been running for years. From surfer point of the view, they are different sites. They can be accessed by mysite1.com and mysite2.com, respectively.

But under Cpanel -> subdomain, I see they are listed as mysite2.mysite1.com, so mysite2 is literally a subdomain of mysite1, even if I already set up a 404 page to make sure google won't crawl my site via mysite2.mysite1.com.

When I add referral exclusion to google analytics (I use gtag.js), should I add

mysite1.com

or

mysite2.mysite1.com
mysite2.com