Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

How to block duplicate pages via robots.txt?

7:53 am on Oct 1, 2012 (gmt 0)

New User

joined:Oct 1, 2012
posts: 3
votes: 0


I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so.

I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example -


So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages

Disallow: /?ordernumber=
Disallow: /?pagenumber=

OR this -> Disallow: /*?

Also, when people are searching on my website for any products & when doing the same via site search the following URL comes.


When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google


So, how do I block the above pages...Is the below way correct?

Disallow: /search?q=

OR this Disallow: /*search?q=

Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google.


[edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012]
[edit reason] Examplified [/edit]

8:15 am on Oct 1, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

Once blocked in robots.txt, Google will continue to show the URLs as URL-only entries in the SERPs.

URLs that redirect should not be blocked. Google needs to see the redirect.

Other duplicate pages could be handled with the rel="canonical" tag, and should not be blocked.

It looks like Google has not indexed individual search results pages from your site, merely the search page with no search paramaters.

The pattern for disallow matches from the left and a * can be used as a wildcard to replace characters that change when you want to match something specific further to the right.

Disallow: /*?

blocks "slash" "something, anything" "question mark" "anything or nothing"
9:10 am on Oct 1, 2012 (gmt 0)

New User

joined:Oct 1, 2012
posts: 3
votes: 0

Thanks g1smd,

So you mean to say I should not block those redirect pages but should provide "rel=canonical" tag.

So I should put the above rel code on http://www.example.com/nokia-asha?ordernumber=12 & suggest google that it is the same page. Like below

<link rel="canonical" href="http://www.example.com/nokia-asha" />

Is the above tag correct?

[edited by: engine at 4:49 pm (utc) on Oct 3, 2012]
[edit reason] examplified [/edit]

4:04 pm on Oct 1, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 403

You can also go into GWT and tell them to ignore certain parameters.
6:53 am on Oct 3, 2012 (gmt 0)

New User

joined:Oct 3, 2012
posts: 3
votes: 0

As lucy24 said go to webmaster tools and tell them to ignore certain parameter can be a way

But you should use robots.txt to do this
use Disallow: /*?

& why not disallow the whole search compartment
Disallow: /search

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members