Has Google changed their approach to robots.txt?

Forum Moderators: goodroi

Message Too Old, No Replies

Has Google changed their approach to robots.txt?

Aside from the recent indexing change.

dennisjensen

8:51 am on Sep 24, 2020 (gmt 0)

Hi guys,

Recently I saw this notice in search console. I admit to never having seen it before:

"robots.txt fetch failed
You have a robots.txt file that we are currently unable to fetch. In such cases we stop crawling your site until we get hold of a robots.txt, or fall back to the last known good robots.txt file."

(It turned out, cough, that the robots.txt was mislaid for a while, cough. Being corrected)

I struggle, though, to understand, if there is a secondary layer to this. This part

In such cases we stop crawling your site until we get hold of a robots.txt

seems rather hard. Did Google change any policies?

FYI, I do see signs that trouble is looming regarding indexing. So, it might not just be empty words.

Any knowledge?

JorgeV

11:16 am on Sep 24, 2020 (gmt 0)

Hello,

It sounds that, in absence of robots.txt , Googlebot interprets it as "disable" by default.

lucy24

5:07 pm on Sep 24, 2020 (gmt 0)

we stop crawling your site until we get hold of a robots.txt, or fall back to the last known good robots.txt file

To me this sounds so vague as to be utterly meaningless. For how long do they refrain from crawling? And what if you’ve intentionally removed your robots.txt and no longer want its Disallows to be honored? (Admittedly this is a pretty silly approach to take, but hey, it’s perfectly legal.)

tangor

5:28 am on Sep 25, 2020 (gmt 0)

Silly thing to say since robots.txt is completely toothless and purely voluntary and is NOT REQUIRED for site functionality, and all too many robots ignore it anyway.

g scare tactics or something else? Why would g need a robots.txt in the first place? On the other hand, if there is a robots.txt g MUST ABIDE if they wish to remain credible.

Just some interesting thoughts.

YMMV

dennisjensen

9:18 am on Sep 25, 2020 (gmt 0)

Thanks guys,

Yes, I wondered as well. The robots.txt isn't mandatory. Either, there are some connected issues on our site, or it's scare tactics or 'hot air'. Anyway, we have fix coming up, I'll let you know, whether things turn around.

iamlost

2:10 pm on Sep 26, 2020 (gmt 0)

I’m not G :) That said there is a logical sequence both explicit and implicit in the message.
As mentioned robots.txt is optional and googlebot does crawl sites without.
Therefore the logic as I read the message:
1. G asks for robots.txt, receives it, and acts accordingly
2. G asks for robots.txt, does not receive it, and checks records for prior presence
1. G does not find a stored copy so treats site as wide open to crawl
2. G does find a stored copy so either
1. Stops crawling site
2. Crawls following stored copy directives

The real question to ask, given that we know that G defaults to slurp the world, is what type(s) of site would invoke the ‘cease crawl’ over the fallback crawl.

I have vague hypotheses but no definitive answers, nor am I about to test. I will however add the question to my ‘keep an eye open’ list.

dennisjensen

3:02 pm on Oct 26, 2020 (gmt 0)

Hi guys,

I promised an update. Our tech fix was implemented, Things seem to have returned to normal, more or less. To conclude, Our robots.txt was mislaid. Putting it back solved it. However, I didn't get smarter about G's wording of the their message.

tangor

4:50 pm on Oct 26, 2020 (gmt 0)

Glad there was a happy resolution!

The mysteries of g will continue. (sigh)