Forum Moderators: open
GoogleBot now supports wildcard filetypes in robots.txt:
User-Agent: googlebot
Disallow: /*.cgi
It was in testing this month and appears to have worked. I guess we will see after this crawl goes live.
That is very responsive of Google to address many of our concerns about dynamic content. I wouldn't try the above on standard (htm,html) file extensions.
And I also block them from anything with a "?".
Any domain added on the end will produce the whois record so it is important to keep spiders out. So I block .com .net .org .info .biz and .us. However other search engines don't block well, so I check if Googlebot is looking at my robots.txt. If it is Googlebot then I use the block commands.