Welcome to WebmasterWorld Guest from 54.198.118.102

Forum Moderators: goodroi

Message Too Old, No Replies

Will this robots.txt do it?

     
2:51 am on Mar 30, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Jan 5, 2004
posts:202
votes: 0


I've never used robots.txt before, and I just need to make sure that search engines won't index pages that are generated by our search.cgi script. For example, I don't want the search engines to follow urls on our site like:

domain.com/dir/search.cgi?color=blue&size=small

Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Would that still let the spiders crawl all over except anything with search.cgi? This wouldn't keep it out of the /dir directory would it?

4:44 am on Mar 30, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Yes.

> Would that still let the spiders crawl all over except anything with search.cgi?

That would still let the spiders crawl all over except anything starting with "/dir/search.cgi"

> This wouldn't keep it out of the /dir directory would it?

No.

The technical term for what robots do is "prefix-matching." The Disallow directive applies to any resource whose prefix matches the given string. So your Disallow applies only to resources which start with /dir/search.cgi -- and possibly more characters, but no less.

Jim

5:10 am on Mar 30, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Jan 5, 2004
posts:202
votes: 0


Thanks very much for the help. I'll go ahead and use that then.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members