Forum Moderators: goodroi

Message Too Old, No Replies

Matching patterns in robots.txt

         

kiransarv

9:08 am on Nov 4, 2008 (gmt 0)

10+ Year Member



Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

In my robots.txt:

I have added,

User-agent: *
Disallow: /index

User-agent: Googlebot
Disallow: /index*start*

Is this correct....

Please help me..

regards
kiran

goodroi

12:16 pm on Nov 7, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld kiransarv!

I would not include index in the Google robots.txt line. I would just have Disallow: /*start*. That will exclude all urls with start in it.

[google.com...]

g1smd

2:39 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you want to disallow all URLs that include start, or just those with index or with id in them?

In any case, the trailing * is not required.

I might use:

Disallow: /index*start