Welcome to WebmasterWorld Guest from 54.205.20.160

Forum Moderators: goodroi

Message Too Old, No Replies

How to block duplicate pages via robots.txt?

   
7:53 am on Oct 1, 2012 (gmt 0)



Hi,

I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so.

I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example -


http://www.example.com/widget-red?ordernumber=12
http://www.example.com/widget-9922?pagenumber=3


So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages


Disallow: /?ordernumber=
Disallow: /?pagenumber=

OR this -> Disallow: /*?

Also, when people are searching on my website for any products & when doing the same via site search the following URL comes.


http://www.example.com/search?categories=0&q=widget+red


When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google


http://www.example.com/search?q=


So, how do I block the above pages...Is the below way correct?

Disallow: /search?q=

OR this Disallow: /*search?q=

Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google.

Thanks.

[edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012]
[edit reason] Examplified [/edit]

8:15 am on Oct 1, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Once blocked in robots.txt, Google will continue to show the URLs as URL-only entries in the SERPs.

URLs that redirect should not be blocked. Google needs to see the redirect.

Other duplicate pages could be handled with the rel="canonical" tag, and should not be blocked.

It looks like Google has not indexed individual search results pages from your site, merely the search page with no search paramaters.

The pattern for disallow matches from the left and a * can be used as a wildcard to replace characters that change when you want to match something specific further to the right.

Disallow: /*?

blocks "slash" "something, anything" "question mark" "anything or nothing"
9:10 am on Oct 1, 2012 (gmt 0)



Thanks g1smd,

So you mean to say I should not block those redirect pages but should provide "rel=canonical" tag.

So I should put the above rel code on http://www.example.com/nokia-asha?ordernumber=12 & suggest google that it is the same page. Like below

<link rel="canonical" href="http://www.example.com/nokia-asha" />

Is the above tag correct?

[edited by: engine at 4:49 pm (utc) on Oct 3, 2012]
[edit reason] examplified [/edit]

4:04 pm on Oct 1, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You can also go into GWT and tell them to ignore certain parameters.
6:53 am on Oct 3, 2012 (gmt 0)



As lucy24 said go to webmaster tools and tell them to ignore certain parameter can be a way

But you should use robots.txt to do this
use Disallow: /*?

& why not disallow the whole search compartment
Disallow: /search
 

Featured Threads

Hot Threads This Week

Hot Threads This Month