homepage Welcome to WebmasterWorld Guest from 54.161.191.154
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Matching patterns in robots.txt
kiransarv

5+ Year Member



 
Msg#: 3779778 posted 9:08 am on Nov 4, 2008 (gmt 0)

Hi all,

I have two dynamic URL pages;

1.http://mydomain.com/index?id=(.*)&query=(.*)
2.http://mydomain.com/index?id=(.*)&query=(.*)&start=10&pager.offset=(.*)

I want to allow robots to crawl the first page but i don't want robots to crawl the page with "&start"...How can i do this.

If I use

"Disallow: /index?id" will block both the URL patterns. So How can i be specific..

In my robots.txt:

I have added,

User-agent: *
Disallow: /index

User-agent: Googlebot
Disallow: /index*start*

Is this correct....

Please help me..

regards
kiran

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3779778 posted 12:16 pm on Nov 7, 2008 (gmt 0)

Welcome to WebmasterWorld kiransarv!

I would not include index in the Google robots.txt line. I would just have Disallow: /*start*. That will exclude all urls with start in it.

[google.com...]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3779778 posted 2:39 pm on Nov 9, 2008 (gmt 0)

Do you want to disallow all URLs that include start, or just those with index or with id in them?

In any case, the trailing * is not required.

I might use:

Disallow: /index*start

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved