Welcome to WebmasterWorld Guest from 54.242.9.97

Forum Moderators: goodroi

Message Too Old, No Replies

Matching patterns in robots.txt

     
9:08 am on Nov 4, 2008 (gmt 0)

New User

5+ Year Member

joined:Sept 19, 2008
posts: 5
votes: 0


Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

In my robots.txt:

I have added,

User-agent: *
Disallow: /index

User-agent: Googlebot
Disallow: /index*start*

Is this correct....

Please help me..

regards
kiran

12:16 pm on Nov 7, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3120
votes: 111


Welcome to WebmasterWorld kiransarv!

I would not include index in the Google robots.txt line. I would just have Disallow: /*start*. That will exclude all urls with start in it.

[google.com...]

2:39 pm on Nov 9, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Do you want to disallow all URLs that include start, or just those with index or with id in them?

In any case, the trailing * is not required.

I might use:

Disallow: /index*start
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members