Forum Moderators: Robert Charlton & goodroi
User-agent: Googlebot
Disallow: /widgets/
[google.com...]
states
We generally download robots.txt files about once a day. You can see the last time we downloaded your file using the robots.txt analysis tool in Google Sitemaps and checking the Last downloaded date and time.
Google might have everflux installed but they still seem to be not able to go through their index pretty fast. :\
It's not really urgent. I am just interested how long does it take?
As with the search servers, there are many machines running the Googlebot application. Each one of them (or perhaps a 'representative' member of their clusters) will have to fetch and process your new robots.txt before it takes effect. I always allow a minimum of 24 hours when making any changes to robots.txt or to UA-based access controls for spiders to re-fetch and re-analyze robots.txt.
After this time, the Googlebots should all stop fetching resources in your /widgets/ subdirectory. However, if you are asking how long it will take for those listings to be removed from search results, the answer may be anywhere from 90 days from main results, a year from Supplemental results with full title/description, and then never -- Any pages with links to them from any page anywhere on the Web may live on in Supplemental results forever with URL-only listings for title and description.
If this latter issue is your concern, then consider *allowing* Googlebot to spider your /widgets/ subdirectory, but redirecting all /widgets/ pages to replacement pages or returning 410-Gone responses at the server level for all /widgets/ pages.
Jim
If "somewhere" a link exists on another site pointing to your directory, G is going to try and follow that link to your disallowed directory....over...and over...and over...
unless you can get all the pointers to your directory removed (probably impossible) you'll have to leave that disallow in place, probably forever..
thanks!
Tera
Disallow: /widgets
(note there's no trailing slash)
I did this with my forum directory, and then pages started showing up in Webmaster Tools as forbidden by robots.txt. Prior to that, they weren't. Adding the trailing slash tells Google it's OK to spider example.com/widgets/ and if there's links to other pages in the directory from there, then they get spidered anyway.
I don't know if this applies in your case or not, but it might not hurt to drop that last slash.
[edited by: AndyA at 6:40 pm (utc) on Dec. 29, 2006]