Forum Moderators: phranque
Simply leaving 301 on to a site that cannot be indexed due to an unreadable robots.txt is a very dangerous thing.
Assuming something is wrong based solely on out of date Search Console data
And since when do you need robots.txt to get a site indexed
Once they notice the redirects, they'll refetch your robots.txt file
If a robots.txt request is not giving 200 (OK) or 404 (not exists) Google will not crawl your site.
Maybe, but we don't know that
access logs for subsequent robots.txt
That was suggested 16 posts ago
If your robots.txt file exists but is unreachable (in other words, if it doesn't return a 200 or 404 HTTP status code), we'll postpone our crawl rather than risk crawling URLs that you do not want crawled.
As far as I'm aware, the absence of a robots.txt file (regardless of status code) suggests to robots that there are no crawling limitations.
We are also seeing that Google tried to access the robots.txt through https months ago. This info now appears in GSC as previous failed attempts...
If it doesn't return a 404 (or, theoretically, a 410), how would Google know it's absent? That's the point of distinguishing between "absent" and "unreachable".
RewriteCond %{HTTPS} !=on
RewriteRule !^robots\.txt$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301] In your opinion is this correct?
We want to add also "sitemaps.xml" to stay on the http. Any idea how to achieve that?
!^(robots\.txt|sitemap\.xml)$
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301] I think it is important not to forward robots.txt
Hypothesis: If you create a GSC account for your brand-new https site, adding it to (or changing from) an existing http site, it will immediately list a robots.txt error dating from some time in the recent past.
By the way, why do I need to add the non-www as properties in GSC? I know Google recommends, but why? It's and has been 301ed to the www for years...
we do have other rules that forward non-www to www
RewriteCond %{HTTPS} !=on [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L] [edited by: phranque at 4:21 am (utc) on Apr 1, 2017]
Is there a risk in not providing a sitemap if the entire site's structure is good?
[edited by: phranque at 11:07 am (utc) on Mar 31, 2017]