I have one main domain and one subdomain hosted on the same server,
main domain: https://mysite1.com/blog/
subdomain: https://mysite2.com/
They are independent sites with different contents.
Their physical location on server is at:
main domain: /public_html/blog/
subdomain: /public_html/mysite2/
The robots.txt of main domain locates under root directory /public_html/robots.txt,
User-Agent: *
Sitemap: https://mysite1.com/blog/sitemap_index.xml
I also have a separate robots.txt under root directory of mysite2 (/public_html/mysite2/robots.txt)
User-Agent: *
Sitemap: https://mysite2.com/sitemap_index.xml
if google bot crawls my main domain, it will also crawls /mysite2/ directory, right?
Should I block bot's access to /mysite2/ in my main domain's robots.txt/, e.g, add this line to
main domain's robots.txt. So, google bot won't crawl /mysite2/ twice?
Disallow: /mysite2/
Or google bot only crawls my site by sitemap_index.xml defined in robots.txt?
In this case, I don't have to do anything?
Recently, I got a google warning of self-referrals in web traffic. I am not sure if this is the reason.
I already added referral exclusion to google analytics.