Will this robots.txt do it?

Forum Moderators: goodroi

Message Too Old, No Replies

Will this robots.txt do it?

rover

2:51 am on Mar 30, 2004 (gmt 0)

I've never used robots.txt before, and I just need to make sure that search engines won't index pages that are generated by our search.cgi script. For example, I don't want the search engines to follow urls on our site like:

domain.com/dir/search.cgi?color=blue&size=small

Could I just use the following robots.txt in the root directory for the site:

User-agent: *
Disallow: /dir/search.cgi

Would that still let the spiders crawl all over except anything with search.cgi? This wouldn't keep it out of the /dir directory would it?

jdMorgan

4:44 am on Mar 30, 2004 (gmt 0)

> Could I just use the following robots.txt in the root directory for the site:


User-agent: *
Disallow: /dir/search.cgi

Yes.

> Would that still let the spiders crawl all over except anything with search.cgi?

That would still let the spiders crawl all over except anything starting with "/dir/search.cgi"

> This wouldn't keep it out of the /dir directory would it?

No.

The technical term for what robots do is "prefix-matching." The Disallow directive applies to any resource whose prefix matches the given string. So your Disallow applies only to resources which start with /dir/search.cgi -- and possibly more characters, but no less.

Jim

rover

5:10 am on Mar 30, 2004 (gmt 0)

Thanks very much for the help. I'll go ahead and use that then.