Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt (/index.jsp?pg name=ccpm/lob/sublob/pgname&site=a1)

Pattern on parameters

         

Surendra

4:21 pm on Jan 28, 2008 (gmt 0)

10+ Year Member



Hi,

I have a requirement where in I need to block 300 odd pages from crwaling by any external search engines like google or yahoo

User-agent: *
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a1
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a2
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a3
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a4
.........................
.........................
.........................
.........................
upto
.........................
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a300

So is there any pattern which I can use to avoid having 300 entries in Robots.txt. I had gone through lot of sites but not sure how to handle second parameter.

Thanks
Surendra

phranque

12:20 am on Jan 29, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], Surendra!

the google web exclusion protocol [google.com] and the yahoo web exclusion protocol [help.yahoo.com] include extensions to the robots.txt [robotstxt.org] standard which allow wildcards.

you could do something like:
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a*

but that would also disallow ...=a301, ...=a302, etc