GoogleBot Supports New Robots.txt Construct

Forum Moderators: open

Message Too Old, No Replies

GoogleBot Supports New Robots.txt Construct

How to block google on the filetype level via robots.txt

Brett_Tabke

12:41 am on Mar 3, 2001 (gmt 0)

As mentioned a week ago [webmasterworld.com], GoogleBot is supporting a new robots.txt construct that addresses indexing dynamic content.

GoogleBot now supports wildcard filetypes in robots.txt:

User-Agent: googlebot
Disallow: /*.cgi

It was in testing this month and appears to have worked. I guess we will see after this crawl goes live.

That is very responsive of Google to address many of our concerns about dynamic content. I wouldn't try the above on standard (htm,html) file extensions.

Brett_Tabke

11:42 pm on Aug 28, 2002 (gmt 0)

has anyone been using the above? (note the date of the previous post)

bird

12:34 am on Aug 29, 2002 (gmt 0)

I'm using this to protect the editing side of a wiki (http://example.com/FrontPage?action=edit), and it has kept Googlebot out without a hitch.

Lisa

1:47 am on Aug 29, 2002 (gmt 0)

Yes, I am using this. I block Googlebot from whois records.

And I also block them from anything with a "?".

Any domain added on the end will produce the whois record so it is important to keep spiders out. So I block .com .net .org .info .biz and .us. However other search engines don't block well, so I check if Googlebot is looking at my robots.txt. If it is Googlebot then I use the block commands.

GoogleGuy

6:40 am on Aug 29, 2002 (gmt 0)

Glad to hear that this is working well for folks! I'm trying to remember if that suggestion came from this forum?

Brett_Tabke

6:44 am on Aug 29, 2002 (gmt 0)

Yes, and was followed up on by you - thanks GG.