homepage Welcome to WebmasterWorld Guest from 54.166.53.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to block duplicate pages via robots.txt?
hyderali



 
Msg#: 4502359 posted 7:53 am on Oct 1, 2012 (gmt 0)

Hi,

I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so.

I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example -


http://www.example.com/widget-red?ordernumber=12
http://www.example.com/widget-9922?pagenumber=3


So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages


Disallow: /?ordernumber=
Disallow: /?pagenumber=

OR this -> Disallow: /*?

Also, when people are searching on my website for any products & when doing the same via site search the following URL comes.


http://www.example.com/search?categories=0&q=widget+red


When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google


http://www.example.com/search?q=


So, how do I block the above pages...Is the below way correct?

Disallow: /search?q=

OR this Disallow: /*search?q=

Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google.

Thanks.

[edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012]
[edit reason] Examplified [/edit]

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4502359 posted 8:15 am on Oct 1, 2012 (gmt 0)

Once blocked in robots.txt, Google will continue to show the URLs as URL-only entries in the SERPs.

URLs that redirect should not be blocked. Google needs to see the redirect.

Other duplicate pages could be handled with the rel="canonical" tag, and should not be blocked.

It looks like Google has not indexed individual search results pages from your site, merely the search page with no search paramaters.

The pattern for disallow matches from the left and a * can be used as a wildcard to replace characters that change when you want to match something specific further to the right.

Disallow: /*?
blocks "slash" "something, anything" "question mark" "anything or nothing"

hyderali



 
Msg#: 4502359 posted 9:10 am on Oct 1, 2012 (gmt 0)

Thanks g1smd,

So you mean to say I should not block those redirect pages but should provide "rel=canonical" tag.

So I should put the above rel code on http://www.example.com/nokia-asha?ordernumber=12 & suggest google that it is the same page. Like below

<link rel="canonical" href="http://www.example.com/nokia-asha" />

Is the above tag correct?

[edited by: engine at 4:49 pm (utc) on Oct 3, 2012]
[edit reason] examplified [/edit]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4502359 posted 4:04 pm on Oct 1, 2012 (gmt 0)

You can also go into GWT and tell them to ignore certain parameters.

iapsingh



 
Msg#: 4502359 posted 6:53 am on Oct 3, 2012 (gmt 0)

As lucy24 said go to webmaster tools and tell them to ignore certain parameter can be a way

But you should use robots.txt to do this
use Disallow: /*?

& why not disallow the whole search compartment
Disallow: /search

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved