homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to block duplicate pages via robots.txt?
hyderali




msg:4502361
 7:53 am on Oct 1, 2012 (gmt 0)

Hi,

I'm handling an ecommerce website & while checking in WMT under Index Status tab, my pages in "Not Index" are MORE than in "Index". While reading the solution I came to know that Google is not indexing because some URLs are redirecting, some pages are duplicates & so.

I found out the URLs which are redirecting & removed them but I want to block those duplicate pages via robots.txt but I don't understand how to provide the pattern because there is some session ids & so. Like for example -


http://www.example.com/widget-red?ordernumber=12
http://www.example.com/widget-9922?pagenumber=3


So how do I suggest Google bot to not to index these pages...should I add the below line to block the above pages


Disallow: /?ordernumber=
Disallow: /?pagenumber=

OR this -> Disallow: /*?

Also, when people are searching on my website for any products & when doing the same via site search the following URL comes.


http://www.example.com/search?categories=0&q=widget+red


When I checked the same URL on google to know whether it has been indexed or not via operator "site:" I found this URL to be on google


http://www.example.com/search?q=


So, how do I block the above pages...Is the below way correct?

Disallow: /search?q=

OR this Disallow: /*search?q=

Sorry for the long post but I'd appreciate if you can answer my query because lately I see many duplicate pages been indexed on google.

Thanks.

[edited by: goodroi at 2:58 pm (utc) on Oct 3, 2012]
[edit reason] Examplified [/edit]

 

g1smd




msg:4502367
 8:15 am on Oct 1, 2012 (gmt 0)

Once blocked in robots.txt, Google will continue to show the URLs as URL-only entries in the SERPs.

URLs that redirect should not be blocked. Google needs to see the redirect.

Other duplicate pages could be handled with the rel="canonical" tag, and should not be blocked.

It looks like Google has not indexed individual search results pages from your site, merely the search page with no search paramaters.

The pattern for disallow matches from the left and a * can be used as a wildcard to replace characters that change when you want to match something specific further to the right.

Disallow: /*?
blocks "slash" "something, anything" "question mark" "anything or nothing"

hyderali




msg:4502392
 9:10 am on Oct 1, 2012 (gmt 0)

Thanks g1smd,

So you mean to say I should not block those redirect pages but should provide "rel=canonical" tag.

So I should put the above rel code on http://www.example.com/nokia-asha?ordernumber=12 & suggest google that it is the same page. Like below

<link rel="canonical" href="http://www.example.com/nokia-asha" />

Is the above tag correct?

[edited by: engine at 4:49 pm (utc) on Oct 3, 2012]
[edit reason] examplified [/edit]

lucy24




msg:4502538
 4:04 pm on Oct 1, 2012 (gmt 0)

You can also go into GWT and tell them to ignore certain parameters.

iapsingh




msg:4503304
 6:53 am on Oct 3, 2012 (gmt 0)

As lucy24 said go to webmaster tools and tell them to ignore certain parameter can be a way

But you should use robots.txt to do this
use Disallow: /*?

& why not disallow the whole search compartment
Disallow: /search

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved