How to stop google from indexing wordpress blog paginated archive

Forum Moderators: goodroi

Message Too Old, No Replies

How to stop google from indexing wordpress blog paginated archive

Abhinav

2:18 pm on Sep 14, 2012 (gmt 0)

Hello,

I want to remove my wordpress blog paginated archive (like http://example.com/page/401) from Google search indexes but not the content appearing on these pages i.e. posts.

Actually I am seeing lots of web user coming from Google search images to these paginated archive and it is quite possible that specific post may have moved to some other paginated archive due to new content updated on the blog.

I have tried to block it through putting Disallow: */page* in robots.txt file but it didn't seems working.

Please help.

Thanks
Abhinav

pankajvirgo28

5:59 pm on Sep 14, 2012 (gmt 0)

Try this Disallow: */page/*/*

Abhinav

9:47 am on Sep 15, 2012 (gmt 0)

Thanks Pankaj, I will try this.

Thanks
Abhinav

g1smd

4:09 pm on Sep 15, 2012 (gmt 0)

Disallow: /page/
will disallow anything beginning example.com/page/

The trailing * is redundant.

phranque

8:50 pm on Sep 15, 2012 (gmt 0)

the Disallow directive in robots.txt is about crawling, not indexing.
you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.

mslina2002

9:13 pm on Sep 15, 2012 (gmt 0)

you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.

I would go this route as well since your pages are already in the index the robots.txt route at this point and adding the disallow will not allow the bots to crawl the pages and as results the pages will still stay in the index but with a blank description.

phranque

9:40 pm on Sep 15, 2012 (gmt 0)

I forgot to mention (and was implied by mslina2002) that if you implement a noindex you must remove the crawling exclusion from robots.txt.

Abhinav

3:57 am on Sep 16, 2012 (gmt 0)

Thanks g1smd, phranque & mslina2002 for your replies. I would do the suggested as above but not sure how & where to do it.

phranque

9:58 am on Sep 16, 2012 (gmt 0)

the <meta> element is part of the Robots Exclusion Protocol.

About the Robots <META> tag:
http://www.robotstxt.org/meta.html [robotstxt.org]

one disadvantage to using the <meta> tag is it only helps you control indexing of HTML documents, since you can't put such a meta element in a pdf file, for example.

the X-Robots-Tag header is a Google extension that has the advantage of providing the meta robots functionality for web resources that are not HTML documents.

Robots Exclusion Protocol: now with even more flexibility:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html [googleblog.blogspot.com]

phranque

10:06 am on Sep 16, 2012 (gmt 0)

by the way, i forgot to welcome you to WebmasterWorld, Abhinav!

Abhinav

2:35 pm on Sep 16, 2012 (gmt 0)

Thanks phranque for the tutorial links

mslina2002

3:53 pm on Sep 16, 2012 (gmt 0)

Another option:

If you are using wordpress you can add the meta tags in:

(1) manually in your archives files. I would only recommend this option if you are familiar with php and wordpress.

OR

(2) use a plugin -- most likely can't mention it here but you can google "wordpress seo". There are several plugins that can help with this. With the plugin all you have to do is go to 'titles and meta' section, and click 'Noindex subpages of archives' button.

Abhinav

2:03 pm on Sep 18, 2012 (gmt 0)

Thanks mslina2002 for other alternatives

iapsingh

6:57 am on Oct 3, 2012 (gmt 0)

Simply use Disallow */page

You can use Yoase SEO plugin for wordpress tackle SEO tasks on a wordpress site/blog very easily