homepage Welcome to WebmasterWorld Guest from 54.204.182.118
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Wildcard blocking of dynamic - robots.txt
To: GoogleGuy
Lisa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 106 posted 3:44 am on Jul 31, 2002 (gmt 0)

I need some verification that I can stop Googlebot from crawling ALL dynamic content.

User-agent: Googlebot
Disallow: /*?

Will this stop googlebot from requesting all dynamic data? The only way to test this is to have a high ranked PR. And I don't feel testing it on a production enviroment when I have the high PR because who knows, it could drop the site from the index for one or two months.

 

bobriggs

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 106 posted 4:00 am on Jul 31, 2002 (gmt 0)

Ick. If I were a spider I would assume you're disallowing everything.

I can't find the original on webcrawler, but I think the glob there only refers to the UA.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 106 posted 4:09 am on Jul 31, 2002 (gmt 0)

Hi Lisa, wasn't there another topic on this yesterday? I'm having Senior Moments these days.

User-Agent: Googlebot
Disallow: /*.asp$
Disallow: /*.cgi$
Disallow: /*.php$

If I'm reading this statement correctly from Google's website, then the above method will prevent Google from indexing dynamic content.

> In addition, Googlebot understands some extensions to the robots.txt standard: Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate that the $ must match the end of a name. For example, to prevent Googlebot from crawling files that end in gif, you may use the following robots.txt entry:

User-Agent: Googlebot
Disallow: /*.gif$

<edit> Ah, never mind. Now that I review this again, I see what you are trying to do...

User-agent: Googlebot
Disallow: /*?$

I wonder?

mbauser2

10+ Year Member



 
Msg#: 106 posted 5:49 am on Jul 31, 2002 (gmt 0)

User-agent: Googlebot
Disallow: /*?$

That would only disallow URLs that end with a question mark.

Lisa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 106 posted 6:13 am on Jul 31, 2002 (gmt 0)

I am looking to block Google for any url that contains a "?", When your PR is high Google will crawl anything. ick

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved