How to block duplicate pages via robots.txt?

Hi,

I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so.

I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example -

http://www.example.com/widget-red?ordernumber=12
http://www.example.com/widget-9922?pagenumber=3

So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages

Disallow: /?ordernumber=
Disallow: /?pagenumber=

OR this -> Disallow: /*?

Also, when people are searching on my website for any products & when doing the same via site search the following URL comes.

http://www.example.com/search?categories=0&q=widget+red

When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google

http://www.example.com/search?q=

So, how do I block the above pages...Is the below way correct?

Disallow: /search?q=

OR this Disallow: /*search?q=

Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google.

Thanks.

[edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012]
[edit reason] Examplified [/edit]

How to block duplicate pages via robots.txt?

hyderali

g1smd

hyderali

lucy24

iapsingh

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week