Forum Moderators: goodroi
I have a requirement where in I need to block 300 odd pages from crwaling by any external search engines like google or yahoo
User-agent: *
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a1
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a2
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a3
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a4
.........................
.........................
.........................
.........................
upto
.........................
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a300
So is there any pattern which I can use to avoid having 300 entries in Robots.txt. I had gone through lot of sites but not sure how to handle second parameter.
Thanks
Surendra
the google web exclusion protocol [google.com] and the yahoo web exclusion protocol [help.yahoo.com] include extensions to the robots.txt [robotstxt.org] standard which allow wildcards.
you could do something like:
Disallow: /index.jsp?pg_name=ccpm/lob/sublob/pgname&site=a*
but that would also disallow ...=a301, ...=a302, etc