How often are you expecting a "good" robot to check the robots.txt file ?
keyplyr
3:23 am on Mar 20, 2018 (gmt 0)
It varies greatly. I can't see any major bot keeping that data for over 24 hours though, and most major bots will request robots.txt every crawl and sometimes several times during that crawl since they often crawl for multiple tasks.
However, despite the term, the robots.txt file never did become *standard* and 80% of bots ignore it completely.
Of the 20% that do request it, over half are just snooping for info and do not follow the directives. That leaves only a handful of *respectful* bots that support robots.txt and comply with site owner's wishes.
Is the robots.txt file important and should all websites use it? Yes, absolutely. If for no other reason, use it for Google, Bing, Yandex and DuckDuck.
lucy24
8:46 am on Mar 20, 2018 (gmt 0)
Google, Bing, Yandex and DuckDuck
I thought DDG didn't do its own crawling. It does have a faviconbot, but that doesn't use robots.txt.
keyplyr
8:49 am on Mar 20, 2018 (gmt 0)
DuckDuckGo's results are a compilation of "over 400" sources including Yahoo! Search BOSS; Wikipedia; Wolfram Alpha; Bing and its own Web crawler the DuckDuckBot
Don't see it much though.
TravisDGarrett
11:59 am on Mar 20, 2018 (gmt 0)
My question was not really how often a robot checks the robots.txt file, but more how often a robot "should" check this file.
24 hours seems a good frequency to me.
lucy24
7:25 pm on Mar 20, 2018 (gmt 0)
It depends on the robot's overall behavior. Major search engines that visit sporadically in the course of every day will cache robots.txt for a while, rather than re-check on the off chance that it has changed in the last five minutes. Robots that do large crawls at longer intervals generally request robots.txt at the beginning of the visit.
Now, some robots' information pages will tell you to wait up to two weeks for changes to be detected. This strikes me as excessive. It does not take a vast amount of programming know-how to change a robot's script on the fly.