homepage Welcome to WebmasterWorld Guest from 54.145.172.149
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to stop google from indexing wordpress blog paginated archive
Abhinav



 
Msg#: 4495192 posted 2:18 pm on Sep 14, 2012 (gmt 0)

Hello,

I want to remove my wordpress blog paginated archive (like http://example.com/page/401) from Google search indexes but not the content appearing on these pages i.e. posts.

Actually I am seeing lots of web user coming from Google search images to these paginated archive and it is quite possible that specific post may have moved to some other paginated archive due to new content updated on the blog.

I have tried to block it through putting Disallow: */page* in robots.txt file but it didn't seems working.

Please help.

Thanks
Abhinav

 

pankajvirgo28



 
Msg#: 4495192 posted 5:59 pm on Sep 14, 2012 (gmt 0)

Try this Disallow: */page/*/*

Abhinav



 
Msg#: 4495192 posted 9:47 am on Sep 15, 2012 (gmt 0)

Thanks Pankaj, I will try this.

Thanks
Abhinav

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4495192 posted 4:09 pm on Sep 15, 2012 (gmt 0)

Disallow: /page/
will disallow anything beginning example.com/page/

The trailing * is redundant.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4495192 posted 8:50 pm on Sep 15, 2012 (gmt 0)

the Disallow directive in robots.txt is about crawling, not indexing.
you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.

mslina2002

10+ Year Member



 
Msg#: 4495192 posted 9:13 pm on Sep 15, 2012 (gmt 0)

you probably want to implement a meta robots noindex element or the X-Robots-Tag; noindex HTTP Response header.


I would go this route as well since your pages are already in the index the robots.txt route at this point and adding the disallow will not allow the bots to crawl the pages and as results the pages will still stay in the index but with a blank description.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4495192 posted 9:40 pm on Sep 15, 2012 (gmt 0)

I forgot to mention (and was implied by mslina2002) that if you implement a noindex you must remove the crawling exclusion from robots.txt.

Abhinav



 
Msg#: 4495192 posted 3:57 am on Sep 16, 2012 (gmt 0)

Thanks g1smd, phranque & mslina2002 for your replies. I would do the suggested as above but not sure how & where to do it.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4495192 posted 9:58 am on Sep 16, 2012 (gmt 0)

the <meta> element is part of the Robots Exclusion Protocol.

About the Robots <META> tag:
http://www.robotstxt.org/meta.html [robotstxt.org]

one disadvantage to using the <meta> tag is it only helps you control indexing of HTML documents, since you can't put such a meta element in a pdf file, for example.

the X-Robots-Tag header is a Google extension that has the advantage of providing the meta robots functionality for web resources that are not HTML documents.

Robots Exclusion Protocol: now with even more flexibility:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html [googleblog.blogspot.com]

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4495192 posted 10:06 am on Sep 16, 2012 (gmt 0)

by the way, i forgot to welcome you to WebmasterWorld, Abhinav!

Abhinav



 
Msg#: 4495192 posted 2:35 pm on Sep 16, 2012 (gmt 0)

Thanks phranque for the tutorial links

mslina2002

10+ Year Member



 
Msg#: 4495192 posted 3:53 pm on Sep 16, 2012 (gmt 0)

Another option:

If you are using wordpress you can add the meta tags in:

(1) manually in your archives files. I would only recommend this option if you are familiar with php and wordpress.

OR

(2) use a plugin -- most likely can't mention it here but you can google "wordpress seo". There are several plugins that can help with this. With the plugin all you have to do is go to 'titles and meta' section, and click 'Noindex subpages of archives' button.

Abhinav



 
Msg#: 4495192 posted 2:03 pm on Sep 18, 2012 (gmt 0)

Thanks mslina2002 for other alternatives

iapsingh



 
Msg#: 4495192 posted 6:57 am on Oct 3, 2012 (gmt 0)

Simply use Disallow */page

You can use Yoase SEO plugin for wordpress tackle SEO tasks on a wordpress site/blog very easily

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved