Robots.txt exclusion for dynamic page

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt exclusion for dynamic page

rogerd

8:42 pm on Jun 17, 2003 (gmt 0)

For some reason, I am having difficulty with some shopping cart pages getting indexed despite attempts to stop this. The pages take the form of,
www.domain.com/store/cart.asp?product=1234

My robots.txt file contains the following:

User-agent: *
Disallow: /cgi-bin/
Disallow: /store/cart.asp

The file validates using Brett's checker. Do I need to change the syntax of the URL to make this work?

jdMorgan

9:22 pm on Jun 17, 2003 (gmt 0)

rogerd,

Your robots.txt should prevent spiders from fetching your shopping cart pages as-is. However, Google will list any page it finds a link to, even without crawling that page. So, the answer depends entirely on what you mean by "cart pages getting indexed". Are these pages listed with a title and description, or is it just the URL that is showing in the SERPs?

In order to prevent the "list just the URL" scenario, the solution is counter-intuitive: You must allow Google to fetch the page, and then use the <meta name="robots" content="noindex"> tag on each page. In the case of a large site with dynamic URLs, this might be easiest done with a "light cloak" by redirecting SEs to a "noindex" page. Since there is no attempt to mislead a searcher, there should be no risk of penalty.

The "list just the URL" problem exists with Ask Jeeves/Teoma, as well as Google.

HTH,
Jim