Welcome to WebmasterWorld Guest from 54.160.254.203

Forum Moderators: goodroi

Message Too Old, No Replies

Wildcard blocking of dynamic - robots.txt

To: GoogleGuy

     
3:44 am on Jul 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 6, 2002
posts:1092
votes: 0


I need some verification that I can stop Googlebot from crawling ALL dynamic content.

User-agent: Googlebot
Disallow: /*?

Will this stop googlebot from requesting all dynamic data? The only way to test this is to have a high ranked PR. And I don't feel testing it on a production enviroment when I have the high PR because who knows, it could drop the site from the index for one or two months.

4:00 am on July 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 10, 2001
posts:748
votes: 0


Ick. If I were a spider I would assume you're disallowing everything.

I can't find the original on webcrawler, but I think the glob there only refers to the UA.

4:09 am on July 31, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


Hi Lisa, wasn't there another topic on this yesterday? I'm having Senior Moments these days.

User-Agent: Googlebot
Disallow: /*.asp$
Disallow: /*.cgi$
Disallow: /*.php$

If I'm reading this statement correctly from Google's website, then the above method will prevent Google from indexing dynamic content.

> In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:

User-Agent: Googlebot
Disallow: /*.gif$

<edit> Ah, never mind. Now that I review this again, I see what you are trying to do...

User-agent: Googlebot
Disallow: /*?$

I wonder?

5:49 am on July 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 25, 2002
posts:378
votes: 0


User-agent: Googlebot
Disallow: /*?$

That would only disallow URLs that end with a question mark.

6:13 am on July 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 6, 2002
posts:1092
votes: 0


I am looking to block Google for any url that contains a "?", When your PR is high Google will crawl anything. ick