homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Will this robots.txt do it?
rover

10+ Year Member



 
Msg#: 350 posted 2:51 am on Mar 30, 2004 (gmt 0)

I've never used robots.txt before, and I just need to make sure that search engines won't index pages that are generated by our search.cgi script. For example, I don't want the search engines to follow urls on our site like:

domain.com/dir/search.cgi?color=blue&size=small

Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Would that still let the spiders crawl all over except anything with search.cgi? This wouldn't keep it out of the /dir directory would it?

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 350 posted 4:44 am on Mar 30, 2004 (gmt 0)

> Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Yes.

> Would that still let the spiders crawl all over except anything with search.cgi?

That would still let the spiders crawl all over except anything starting with "/dir/search.cgi"

> This wouldn't keep it out of the /dir directory would it?

No.

The technical term for what robots do is "prefix-matching." The Disallow directive applies to any resource whose prefix matches the given string. So your Disallow applies only to resources which start with /dir/search.cgi -- and possibly more characters, but no less.

Jim

rover

10+ Year Member



 
Msg#: 350 posted 5:10 am on Mar 30, 2004 (gmt 0)

Thanks very much for the help. I'll go ahead and use that then.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved