Welcome to WebmasterWorld Guest from 54.227.101.214

Forum Moderators: goodroi

Message Too Old, No Replies

Matching patterns in robots.txt

     

kiransarv

9:08 am on Nov 4, 2008 (gmt 0)

5+ Year Member



Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

In my robots.txt:

I have added,

User-agent: *
Disallow: /index

User-agent: Googlebot
Disallow: /index*start*

Is this correct....

Please help me..

regards
kiran

goodroi

12:16 pm on Nov 7, 2008 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld kiransarv!

I would not include index in the Google robots.txt line. I would just have Disallow: /*start*. That will exclude all urls with start in it.

[google.com...]

g1smd

2:39 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Do you want to disallow all URLs that include start, or just those with index or with id in them?

In any case, the trailing * is not required.

I might use:

Disallow: /index*start