Matching patterns in robots.txt

Forum Moderators: goodroi

Message Too Old, No Replies

9:08 am on Nov 4, 2008 (gmt 0)

Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

In my robots.txt:

I have added,

User-agent: *
Disallow: /index

User-agent: Googlebot
Disallow: /index*start*

Is this correct....

Please help me..

regards
kiran

12:16 pm on Nov 7, 2008 (gmt 0)

Welcome to WebmasterWorld kiransarv!

I would not include index in the Google robots.txt line. I would just have Disallow: /*start*. That will exclude all urls with start in it.

2:39 pm on Nov 9, 2008 (gmt 0)

Do you want to disallow all URLs that include start, or just those with index or with id in them?

In any case, the trailing * is not required.

I might use:

Disallow: /index*start