Msg#: 4306092 posted 2:53 am on May 1, 2011 (gmt 0)
How could I tell my "no follows" are working..if I recently added new ones towards particular links?
Just wait for the next new robot to show up. If you've got a decently sized site, that should be, oh, within the next half-hour or so.
And does "no following" a link eventually remove it from google if it was previously cached before ?
But i'm expecting a reply is going to stay I need to restrict in Robots.txt to pull them from the cache.
Far as I can tell, once google knows a page exists, they'll keep crawling it forever. And once they know a page doesn't exist, they'll still keep crawling it forever just to make sure it doesn't rematerialize. (Analogy to the common cold presents itself.)
You'd think, wouldn't you, that after trying a page eight or ten times and not finding it, the computer would go back one step and check whether your internal links to the page are still there, or whether it's still on your sitemap, or...
Msg#: 4306092 posted 1:02 am on May 3, 2011 (gmt 0)
blocking a url with robots.txt is supposed to exclude the robot from crawling that url. it doesn't prevent that url from appearing in the index, either as a url-only snippet or possibly with information obtained from sources other than the document at that url, such as the anchor text of an inbound link. it is only about crawling, so there is no implied instruction for removal from the index.
also note that a robots.txt-excluded url that has a noindex meta robots tag or X-Robots header may possibly be indexed because the robots.txt exclusion prevent the noindex instruction from being requested.
the rel="nofollow" [google.com] anchor attribute is intended to prevent the transfer of PR and anchor text to the target url and will also drop the target url from the link graph unless the url is otherwise discovered such as through an xml sitemap or another anchor that is not nofollowed.
Msg#: 4306092 posted 11:38 pm on May 4, 2011 (gmt 0)
the best way to remove a url from the index is to allow crawling of that url and provide a noindex meta robots tag or an X-Robots-Tag HTTP header. if the content for a url no longer exists, the best way to remove that url from the index is to allow crawling of that url and provide a 410 Gone status code response.