Welcome to WebmasterWorld Guest from 18.104.22.168 , register , free tools , login , search , pro membership , help , library , announcements , recent posts , open posts Become a Pro Member
How to block duplicate pages via robots.txt? hyderali msg:4502361 7:53 am on Oct 1, 2012 (gmt 0) Hi, I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so. I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example - http://www.example.com/widget-red?ordernumber=12 http://www.example.com/widget-9922?pagenumber=3 So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages Disallow: /?ordernumber= Disallow: /?pagenumber= OR this -> Disallow: /*? Also, when people are searching on my website for any products & when doing the same via site search the following URL comes. http://www.example.com/search?categories=0&q=widget+red When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google http://www.example.com/search?q= So, how do I block the above pages...Is the below way correct? Disallow: /search?q= OR this Disallow: /*search?q= Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google. Thanks. [ edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012] [edit reason] Examplified [/edit]
g1smd msg:4502367 8:15 am on Oct 1, 2012 (gmt 0)
Once blocked in robots.txt, Google will continue to show the URLs as URL-only entries in the SERPs. URLs that redirect should not be blocked. Google needs to see the redirect. Other duplicate pages could be handled with the rel="canonical" tag, and should not be blocked. It looks like Google has not indexed individual search results pages from your site, merely the search page with no search paramaters. The pattern for disallow matches from the left and a * can be used as a wildcard to replace characters that change when you want to match something specific further to the right.
blocks "slash" "something, anything" "question mark" "anything or nothing"
hyderali msg:4502392 9:10 am on Oct 1, 2012 (gmt 0)
Thanks g1smd, So you mean to say I should not block those redirect pages but should provide "rel=canonical" tag. So I should put the above rel code on http://www.example.com/nokia-asha?ordernumber=12 & suggest google that it is the same page. Like below <link rel="canonical" href="http://www.example.com/nokia-asha" /> Is the above tag correct? [ edited by: engine at 4:49 pm (utc) on Oct 3, 2012] [edit reason] examplified [/edit]
lucy24 msg:4502538 4:04 pm on Oct 1, 2012 (gmt 0)
You can also go into GWT and tell them to ignore certain parameters. iapsingh msg:4503304 6:53 am on Oct 3, 2012 (gmt 0)
As lucy24 said go to webmaster tools and tell them to ignore certain parameter can be a way But you should use robots.txt to do this use Disallow: /*? & why not disallow the whole search compartment Disallow: /search