Forum Moderators: goodroi
User-agent: *
disallow: /
But, I see that it is still going through my pages. Googlebot is on my server 24 hours a day. Does it need to leave and come back before it starts checking the robots.txt file, or should it check it with each query?
If it has to leave and come back, what if it never does?
looks like Jim beat me to it, I want to collect more points :)
But, I see that it is still going through my pages. Googlebot is on my server 24 hours a day. Does it need to leave and come back before it starts checking the robots.txt file, or should it check it with each query?
Like many questions about Google, the answer to this one is found on (surprise!) the Google Information for Webmasters section of the Google site right on the FAQ page where it belongs :)
In answer to the question Why isn't Googlebot obeying my robots.txt file? [google.com] you will find:
To save bandwidth, Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server. So, it may take a while for Googlebot to learn of any changes that might have been made to your robots.txt file. Also, Googlebot is distributed on several machines. Each of these keeps its own record of your robots.txt file. Also, check that your syntax is correct against the standard at: [robotstxt.org...] If there still seems to be a problem, please let us know and we'll correct it.
When I change it back, I want to add the following:
User-agent: *
disallow: /*.gif$
disallow: /*.jpg$
disallow: /*.jpeg$
disallow: /*.bmp$
Am I right in assuming that this will effectively allow the bots to go through the forum, yet not download graphic files, saving me lots of bandwidth?
Thanks for making me do some reading and learning a lot more than I knew when I started out on this thread!
The syntax you suggest, taken from Google's Webmaster info, is not general, it is a variant that, it appears, is only used by Google. Check out Robots.txt - Am I Missing Somthing? [webmasterworld.com].
In message #2 DaveAtIFG point out that:
Wild cards are only acceptable in the User-Agent field.
while in message #9 jdMorgan adds:
AFAIK, the only "big" search engine that supports extensions to the Standard for Robots Exclusion is Google, as documented in their Webmaster Help section.
So your code should keep Googlebot out of your images etc., but will be ignored by all other robots.
I suspect that the correct way would be to make separate directories for these files, the robots exclusion protocol deals specifically with keeping robots out of directories.
Take what I have written with a grain of salt until confirmed by someone more knowlegeable, I have just learned this a few minutes ago :)
For example, I want to stop the [good] bots from exectuing the URLs:
[my-domain.com...]
[my-domain.com...]
Will the following work?
User-agent: *
disallow: /*showtopic.php
disallow: /*search