Welcome to WebmasterWorld Guest from 54.159.30.26

Forum Moderators: goodroi

Message Too Old, No Replies

Blog Tag and Author page blocking with robots.txt

     
11:30 pm on Dec 2, 2016 (gmt 0)

New User

joined:Dec 2, 2016
posts: 1
votes: 0


I have a blog on my site with author (/blog/author/name) and tag (/blog/tag/tagname) pages starting to add up.There are close to 200 out of about 1200 total pages. My opinion is that these are adding little value and taking up crawl budget.

What do you think about blocking those subfolders in the robots.txt file? And to do that would I just use Disallow: /blog/author/ and Disallow: /blog/tag/ in the robots.txt file?

In addition to blocking them from the crawl I would also NoIndex them from Wordpress so they stopped showing up in SERPs.

How do you all handle Author and Tag pages?

Any thoughts would be greatly appreciated. Thanks
12:14 am on Dec 3, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10658
votes: 631


Hi seojustin and welcome to WebmasterWorld [webmasterworld.com]
would I just use Disallow: /blog/author/ and Disallow: /blog/tag/ in the robots.txt file?
Yes, that's correct; which ever works towards your needs.

What "crawl budget." are you referring to?

As for the WP noindex question, I'll yield to those more knowledgeable with WP.
2:37 am on Dec 3, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3566
votes: 197


Is this a WordPress blog? It's a good idea to limit the types of pages that are indexed since WP offers so many ways to find the same content. The problem with disallowing the pages you don't want to index is that robots will follow links from one part of your site to another. So you can tell Google not to crawl your /tag/ directories but that does not prevent them from being indexed.

Are you using a plugin to create your sitemaps and if so does your plugin give you control of what taxonomy is being submitted?
5:29 am on Dec 3, 2016 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11165
votes: 116


The problem with disallowing the pages you don't want to index is that robots will follow links from one part of your site to another. So you can tell Google not to crawl your /tag/ directories but that does not prevent them from being indexed.

another way of stating this is that when the crawler is blocked by robots.txt it won't see the noindex.
it does however know the url and perhaps some anchor text and context...
6:27 am on Dec 3, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10658
votes: 631


Which is why access to robots.txt should always be allowed. So don't block the crawler by name, only noindex the specific files/pages.