Forum Moderators: goodroi
# Disallow Google ad bots from the entire site
User-agent: Mediapartners-Google*
Disallow: /
# Disallow Google Adsense from the entire site
User-agent: Google AdSense
Disallow: /
# Disallow Googlebot, MSN and Yahoo from these directories and all files within them
User-agent: Googlebot
User-agent: MSNBot
User-agent: Slurp
Disallow: /css
Disallow: /cgi-bin
Disallow: /error_docs
Disallow: /images
Disallow: /js
Disallow: /somedir
Disallow: /somedir
# Disallow ALL bots from these directories and all their child objects
User-agent: *
Disallow: /css
Disallow: /cgi-bin
Disallow: /error_docs
Disallow: /images
Disallow: /js
Disallow: /somedir
Disallow: /somedir
# Disallow specific bots from indexing or crawling the site at all
# Most recent additions first.
User-agent: GigaBot
Disallow: /
User-agent: Voyager
Disallow: /
User-agent: BaiDuSpider
Disallow: /
User-agent: BackRub/*.*
Disallow: /
User-agent: Grub.org
Disallow: /
User-agent: BotRightHere
Disallow: /
User-agent: larbin
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Walhello appie
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: Googlebot-Image
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: Yandex bot
Disallow: /
# END OF FILE
TIA
Also for BackRub/*.*, are you trying to use wildcards for the UA? If so I don't think that is allowed.
It might pay you to put the "User-agent: *" section at the very end. Some bots might not look any further than that to see if there is something that matches their specific UA.
Why is there a * after Mediapartners-Google?
Also for BackRub/*.*, are you trying to use wildcards for the UA? If so I don't think that is allowed.
It might pay you to put the "User-agent: *" section at the very end.
Is the file otherwise OK? Funnily enough Googlebot seems to be ignoring its exclusions as it was actively spidering two folders that we disallowed. What's that about -- any ideas?
Why is there a * after Mediapartners-Google?
I take it the file is OK though? Nothing in there to prevent Google or any other bot spidering the site?
# Disallow Google Adsense from the entire site
This is covered by the mediapartners ban so this run is not required.
# Disallow Googlebot, MSN and Yahoo from these directories and all files within them
(..)
# Disallow ALL bots from these directories and all their child objects
The list of files and directories is the same for both, so the first is redundant.
Finally, there shouldn't be an extra line feed in the "Disallow ALL bots" part.
# Disallow Google ad bots from the entire site
User-agent: Mediapartners-Google*
Disallow: /
# Disallow ALL bots from these directories and all their child objects
User-agent: *
Disallow: /css
Disallow: /cgi-bin
Disallow: /error_docs
Disallow: /images
Disallow: /js
Disallow: /somedir
Disallow: /somedir
# Disallow specific bots from indexing or crawling the site at all
# Most recent additions first. (followed by your list of specific bots)
the file has to be called robots.txt
robots.txt has to be in the root directory of the web site.
The exclusions have to be the in the same case..
# Disallow Google Adsense from the entire site
This is covered by the mediapartners ban so this run is not required.
The list of files and directories is the same for both, so the first is redundant.
Finally, there shouldn't be an extra line feed in the "Disallow ALL bots" part.
Thaks for everyone's help. Much appreciated.