homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Wildcard blocking of dynamic - robots.txt
To: GoogleGuy

 3:44 am on Jul 31, 2002 (gmt 0)

I need some verification that I can stop Googlebot from crawling ALL dynamic content.

User-agent: Googlebot
Disallow: /*?

Will this stop googlebot from requesting all dynamic data? The only way to test this is to have a high ranked PR. And I don't feel testing it on a production enviroment when I have the high PR because who knows, it could drop the site from the index for one or two months.



 4:00 am on Jul 31, 2002 (gmt 0)

Ick. If I were a spider I would assume you're disallowing everything.

I can't find the original on webcrawler, but I think the glob there only refers to the UA.


 4:09 am on Jul 31, 2002 (gmt 0)

Hi Lisa, wasn't there another topic on this yesterday? I'm having Senior Moments these days.

User-Agent: Googlebot
Disallow: /*.asp$
Disallow: /*.cgi$
Disallow: /*.php$

If I'm reading this statement correctly from Google's website, then the above method will prevent Google from indexing dynamic content.

> In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:

User-Agent: Googlebot
Disallow: /*.gif$

<edit> Ah, never mind. Now that I review this again, I see what you are trying to do...

User-agent: Googlebot
Disallow: /*?$

I wonder?


 5:49 am on Jul 31, 2002 (gmt 0)

User-agent: Googlebot
Disallow: /*?$

That would only disallow URLs that end with a question mark.


 6:13 am on Jul 31, 2002 (gmt 0)

I am looking to block Google for any url that contains a "?", When your PR is high Google will crawl anything. ick

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved