| 8:49 pm on Feb 28, 2010 (gmt 0)|
Disallow means not allowing, blocking
* means everybody
so basically you are telling every bot NOT to access pages on your site.
If you want every bot to crawl your site just leave your robots.txt file empty.
| 3:35 am on Mar 1, 2010 (gmt 0)|
@ Staffa. Thanks for your quick response.
Suppose I changed my robots.txt file according to your tips. Now when the Google or other search engines will be notified that I have changed robots file ( as my site is new and just 1 page has been indexed )?
| 10:24 am on Mar 1, 2010 (gmt 0)|
|so basically you are telling every bot NOT to access pages on your site. |
the robots exclusion protocol uses the most specific rule and matches the pattern from left to right.
a blank pattern for Disallow means "disallow NOTHING".
from "The Web Robots Pages" [robotstxt.org] of the "official" REP site:
|To allow all robots complete access |
(or just create an empty "/robots.txt" file, or don't use one at all)
you will find similar information on SE help pages regarding robots.txt syntax.
| 12:26 pm on Mar 1, 2010 (gmt 0)|
I would go with a whitelist of allowed bots, disallowing all others... Along the lines of:
# Whitelisted user-agents are allowed
# Disallow all others
| 9:33 am on Mar 3, 2010 (gmt 0)|
|I would go with a whitelist of allowed bots, disallowing all others... |
So you allow very few robots? What about all the other 'good bots'? - or do they not count anything towards site promotion?
| 10:00 am on Mar 3, 2010 (gmt 0)|
That is correct. I allow very few bots as most of the others do little toward site promotion. It also allows me to keep my last 13 hairs since I am not tempted to pull them out when things do not go well. :)
(and teoma is on a short list of things to get rid of...)
But that's just me!
Obviously one can whitelist what they like. The above is my list... and one of them is for an onsite search box that keeps the visitor on my site...
| 7:49 am on Mar 18, 2010 (gmt 0)|
If you want Google to start indexing your site again, then add it into:
It'll normally start crawling it within the next few days, and like always takes up to a month to process it in it's search results.
| 9:12 am on Mar 18, 2010 (gmt 0)|
phranque is correct. Re-read the original post. The OP states they changed the Robots.txt to the version that allows bots to access the site after they initially displayed the wrong version. But that's not what is under discussion, despite what the title of this discussion shows, which is the source of the confusion. Here is what the discussion is about:
|So my question is that how long will the Search Engines take time to acquainted of my changed robots.txt? |
This is why it's important to accurately describe what the discussion is about. ;)
As for the answer, keep building links. Bots revisit a site according to how many links you have and how worth crawling they are. Robots.txt will not slow down how often a bot visitors your site. So once a bot follows a link to find your site they will see the correct robots.txt and proceed to index your site.
| 12:11 pm on Mar 18, 2010 (gmt 0)|
Google and the other search engines strongly prefer to crawl based on links and not url submissions. I would not spend any time submitting a url to a search engine. Even if a mistake has been made you will be automatically crawled by the search engines as long as you have links pointing to the content.