Welcome to WebmasterWorld Guest from 54.162.10.132

Forum Moderators: goodroi

Message Too Old, No Replies

Will this robots.txt do it?

     
2:51 am on Mar 30, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Jan 5, 2004
posts:202
votes: 0


I've never used robots.txt before, and I just need to make sure that search engines won't index pages that are generated by our search.cgi script. For example, I don't want the search engines to follow urls on our site like:

domain.com/dir/search.cgi?color=blue&size=small

Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Would that still let the spiders crawl all over except anything with search.cgi? This wouldn't keep it out of the /dir directory would it?

4:44 am on Mar 30, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Yes.

> Would that still let the spiders crawl all over except anything with search.cgi?

That would still let the spiders crawl all over except anything starting with "/dir/search.cgi"

> This wouldn't keep it out of the /dir directory would it?

No.

The technical term for what robots do is "prefix-matching." The Disallow directive applies to any resource whose prefix matches the given string. So your Disallow applies only to resources which start with /dir/search.cgi -- and possibly more characters, but no less.

Jim

5:10 am on Mar 30, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Jan 5, 2004
posts:202
votes: 0


Thanks very much for the help. I'll go ahead and use that then.