Forum Moderators: Robert Charlton & goodroi
(blocked by robots.txt, meta robots tag index/follow, canonical to number 1)
What do you think about this strategy?
(Google still crawls a page even if its blocked by robots.txt).
Would this optimiza crawl budget? Or a better strategy is needed?
"Valid with warnings"That makes it sound as if they are not crawling. “Indexed though blocked by robots.txt” simply means “We know that this URL exists, although we haven’t officially visited”. There have been a number of recent threads about this particular issue. Or non-issue, depending on your viewpoint.
That makes it sound as if they are not crawling. “Indexed though blocked by robots.txt” simply means “We know that this URL exists, although we haven’t officially visited”
to me it means, "At some point in the past we (Google) added this page to our index, now you asked us not to go there, so we haven't gone there (and do not plan to go) to see if anything has changed"I re-checked GSC. Every single item on the list--not just pages, but extensions like .midi that I don't want crawled--has always been disallowed in robots.txt. What I find more mysterious is that every one of those listed items has a “Last crawled” date--generally in September of this year--which is manifestly untrue. They're not sneaking around in disguise; a few of the listed items are so obscure, nobody at all visited on the specified date.
Choose the one you want, but you can't do all three at once.My position is that 999 times out of 1000, it doesn’t matter if uncrawled content is putatively indexed.
What I find more mysterious is that every one of those listed items has a “Last crawled” date--generally in September of this year--which is manifestly untrue. They're not sneaking around in disguise; a few of the listed items are so obscure, nobody at all visited on the specified date.
i might take "last crawled" to mean the last time that url was checked for robots.txt exclusionIf so, that sheds an interesting light on how G uses robots.txt: they don't just crawl it and stash its information in a database, but instead visit with a specific shopping list and make notes about what robots.txt has to say about items on that specific list. G may not be as robots.txt-obsessed as some crawlers one could name, but they have certainly read it more recently than September 2019!