Welcome to WebmasterWorld Guest from 54.167.129.169

Forum Moderators: goodroi

Message Too Old, No Replies

How to stop google from indexing wordpress blog paginated archive

     
2:18 pm on Sep 14, 2012 (gmt 0)

New User

joined:Sept 14, 2012
posts: 5
votes: 0


Hello,

I want to remove my wordpress blog paginated archive (like http://example.com/page/401) from Google search indexes but not the content appearing on these pages i.e. posts.

Actually I am seeing lots of web user coming from Google search images to these paginated archive and it is quite possible that specific post may have moved to some other paginated archive due to new content updated on the blog.

I have tried to block it through putting Disallow: */page* in robots.txt file but it didn't seems working.

Please help.

Thanks
Abhinav
5:59 pm on Sept 14, 2012 (gmt 0)

New User

5+ Year Member

joined:May 26, 2010
posts: 1
votes: 0


Try this Disallow: */page/*/*
9:47 am on Sept 15, 2012 (gmt 0)

New User

joined:Sept 14, 2012
posts: 5
votes: 0


Thanks Pankaj, I will try this.

Thanks
Abhinav
4:09 pm on Sept 15, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /page/
will disallow anything beginning example.com/page/

The trailing * is redundant.
8:50 pm on Sept 15, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10595
votes: 22


the Disallow directive in robots.txt is about crawling, not indexing.
you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.
9:13 pm on Sept 15, 2012 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2003
posts:358
votes: 0


you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.


I would go this route as well since your pages are already in the index the robots.txt route at this point and adding the disallow will not allow the bots to crawl the pages and as results the pages will still stay in the index but with a blank description.
9:40 pm on Sept 15, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10595
votes: 22


I forgot to mention (and was implied by mslina2002) that if you implement a noindex you must remove the crawling exclusion from robots.txt.
3:57 am on Sept 16, 2012 (gmt 0)

New User

joined:Sept 14, 2012
posts: 5
votes: 0


Thanks g1smd, phranque & mslina2002 for your replies. I would do the suggested as above but not sure how & where to do it.
9:58 am on Sept 16, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10595
votes: 22


the <meta> element is part of the Robots Exclusion Protocol.

About the Robots <META> tag:
http://www.robotstxt.org/meta.html [robotstxt.org]

one disadvantage to using the <meta> tag is it only helps you control indexing of HTML documents, since you can't put such a meta element in a pdf file, for example.

the X-Robots-Tag header is a Google extension that has the advantage of providing the meta robots functionality for web resources that are not HTML documents.

Robots Exclusion Protocol: now with even more flexibility:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html [googleblog.blogspot.com]
10:06 am on Sept 16, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10595
votes: 22


by the way, i forgot to welcome you to WebmasterWorld, Abhinav!
2:35 pm on Sept 16, 2012 (gmt 0)

New User

joined:Sept 14, 2012
posts: 5
votes: 0


Thanks phranque for the tutorial links
3:53 pm on Sept 16, 2012 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2003
posts:358
votes: 0


Another option:

If you are using wordpress you can add the meta tags in:

(1) manually in your archives files. I would only recommend this option if you are familiar with php and wordpress.

OR

(2) use a plugin -- most likely can't mention it here but you can google "wordpress seo". There are several plugins that can help with this. With the plugin all you have to do is go to 'titles and meta' section, and click 'Noindex subpages of archives' button.
2:03 pm on Sept 18, 2012 (gmt 0)

New User

joined:Sept 14, 2012
posts: 5
votes: 0


Thanks mslina2002 for other alternatives
6:57 am on Oct 3, 2012 (gmt 0)

New User

joined:Oct 3, 2012
posts: 3
votes: 0


Simply use Disallow */page

You can use Yoase SEO plugin for wordpress tackle SEO tasks on a wordpress site/blog very easily