Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- robots.txt - Google's JohnMu Tweets a tip


tedster - 6:30 pm on May 30, 2010 (gmt 0)


if they are not crawled what is the proper terminology when Googlebot requests the robots.txt file and takes action on the directives?

The robots.txt file itself is certainly "crawled". Then what is supposed to happen (and usually does if there is no glitch) is that those Disallowed URIs are NOT crawled - not requested from the server - from then on, as long as they are present in robots.txt. That is, the CONTENT of the disallowed URI is not indexed, but the URI itself - that character string - is definitely stored.

If the URI is listed as a URI-only search result, it won't automatically be removed from the search index because it appears in robots.txt. However, because it (or its pattern) is listed in robots.txt, the webmaster can then request its removal from Google's index.

Of course, by listing the URIs in the first place, robots.txt has definitely announced to the world that they do exist, and not all bots are as well-behaved as search engine crawlers. That alone is a good reason to limit the use of robots.txt. But in some situations, robots.txt can really help clean up duplicate URI situations and the like.


Thread source:: http://www.webmasterworld.com/google/4143083.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com