Forum Moderators: open
[edited by: incrediBILL at 4:46 am (utc) on Jul 24, 2011]
The site owner isn't going to send out an announcement; you have to go in and look.
robots.txt files and indexes their content. There is even a "cache" link for many of these, leading to a cached copy of the file. Disallow: /robots.txt to the robots.txt file soon removes the robots.txt entry from the SERPs but does not stop Google coming back to read the file to see the list of what they should not be indexing. Disallow: / stops all access to all URLs within the site for the purpose of indexing. It does not stop access to the robots.txt file for site access control and it does not stop access to the WMT account ID file (e.g. google2b27d8288a99e5a6.html) for the purpose of verification.
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Possibly I need to buy a new dictionary. In mine, "visit" and "index" are entirely different concepts.
meta robots noindex tag stops the content appearing in the SERPs. Disallow: /" rule. It says do not visit ANY URLs beginning with "/" on the site. However that does not stop the bot from visiting "/robots.txt" (which itself begins with "/") to find out what it should not be visiting. And, of course, "all robots" means "all robots including you, even if I have elsewhere singled you out for personal mention".
User-agent: *" section applies to all other robots, i.e. all those not specifically named. User-agent: * section should be applying to a specific robot, then that rule needs to be duplicated into the section for that specific robot. User-agent: * section be read.
"urlresolve" may fall into that gray area of link-checker, which has always claimed exemption from following robots.txt directives since no actual indexing was done.
Check out what Matt Cutts and Vanessa Fox said in 2006.
{link}
That stuff is still true today. Read jdMorgan's comments too.
The URLresolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents.
All we know is that Google's "urlresolver" is not a real person visiting in real time; it's just another robot executing an automated, unknown task.
Simple, repeatedly observed fact: a change in robots.txt does not result in an immediate change in behavior toward roboted-out directories by robots who have read the text. It can take up to a week. Sometimes far more.