homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / WordPress
Forum Library, Charter, Moderators: lorax & rogerd

WordPress Forum

    
How to suggest Google not to index all of old posts
Need advice on how to set 301 or Noindex tag to not index old posts
federico2005




msg:4438207
 1:25 am on Apr 7, 2012 (gmt 0)

I need to suggest search engines to not index all the old posts -- posted until a certain date.
I'd like to know if there is a way to set up, 301 redirect or Noindex Nofollow Meta Robots tags, for Google not to index post until a certain date.

If either way is viable, which of two options -- 301 redirect or Noindex -- would be better for my problem?
Thanks

P.S. I know there is a plugin setting 301 redirect, but I don't want installing any plugins for that.

 

DeeCee




msg:4438210
 1:59 am on Apr 7, 2012 (gmt 0)

You can't tell GoogleBot directly to search for a date on their own.

But in general you would have to tell GoogleBot on each page by adding a noindex tag to each page you do not want indexed. Should be a simple task by adding a slight piece of code, for example added to template code, that checks the post date and throws in the noindex meta tag if the post-date is too old. In Wordpress that should be possible with merely a couple of lines of code..

Be careful with 301 Redirect. Where would you redirect to?
If you only redirect Google and not users, that can get you banned. And if you redirect all old posts somewhere else (for Google and for users) then you are essentially deleting the posts, so why not then just do that? :)
So, using a noindex is the way to go, rather than a redirect.


Curious why you want older content to be ignored?

federico2005




msg:4438216
 2:21 am on Apr 7, 2012 (gmt 0)

DeeCee, I want older posts to be ignored because most of them were from autoblogging practices, so i feel myself in danger to get duplicate content penalties.

Deleting the posts would mean displaying a ton of 404 pages, pls correct me if i'm wrong.

Yes, I guess a little code added to template makes the job for "noindex meta tag". The problem is how to get this code :) I know how to set up on non WordPress pages, but when you have a WP index.php file ?

DeeCee




msg:4438217
 3:36 am on Apr 7, 2012 (gmt 0)

Yes, deleting the old posts would mean 404s. But if they are from auto-blogging, you want them to disappear anyway I would guess? Remember, that if the "world" out there has links going to those posts, it could still cost you, if Google considers it bad content. Whether they index it (make it available in search engines or not), they will still be lifting and "reading" it, unless blocked. Noindex merely blocks web-users from finding it.

Another option would be to move all old posts into a separate path, like putting it on a category path, and simply block Google in robots.txt from seeing them.

In fact, if your current permalink structure is date based like '/2010/11/15/postname', you can simply block those bad paths in robots.txt. If they are on a permalink structure like '/postname/', then you would have to move the old stuff into a separate path.

Do not touch the index.php file. Don't mess with original WP files. They will just get clobbered on an update.

Go to /wp-content/themes/YourThemeName, and add to the relevant post loop files in there. Or better, create a new theme (merely linked up as an empty parent-theme to the old theme as a child-theme) and add the extra functionality there. That way your original theme files stay intact, in case it is suddenly updated.

federico2005




msg:4438239
 9:19 am on Apr 7, 2012 (gmt 0)

DeeCee, thanks for your help.
About this my blog, I feel it already got Google penalties for duplicate content, as I see it is dramatically slipped down in the serps. Despite that it keep getting a lot of traffic (even increased in the last weeks). My big problem is that my new original posts now can't rank properly in the Google search results because of old duplicate content posts (and my new original posts are important as well for my business).
So, I don't know well what I have to do with it.

My link structure is '/2010/11/15/postname'.
Sorry for my dumbness, I didn't understand when you suggested the following:
"you can simply block those bad paths in robots.txt"
and
"Go to /wp-content/themes/YourThemeName, and add to the relevant post loop files in there"

Could you show me how to do that with a very basic example?

Thanks again for your willingness and for your patience.

DeeCee




msg:4438280
 2:29 pm on Apr 7, 2012 (gmt 0)

Since you are using date base permalinks (making posts look like they are in a directory structure using year/month/day, you can simply block Google from reading them using your robots.txt file.

To block all bots, use something like:

User-Agent: *
Allow: /
Disallow: /2008/
Disallow: /2009/
Disallow: /2010/


Whatever period you want to block.

If you have a year where you want to be more specific (such as blocking the first 3 months of 2011, you simply add

Disallow: /2011/01/
Disallow: /2011/02/
Disallow: /2011/03/

instead.

If you only want to block them from Google, replace 'User-Agent: *' with 'User-Agent: GoogleBot'.
Check robotstxt.org for more description of robots.txt files. Or just check with a Google search.

Your template directories contains standard files that does things like display post loops, side-bars, and other things. All depending on what the actual theme does and looks like. You would simply create the new parent theme linked to the old theme with one or more files replaced with copies from your original theme, but modified with the new checks.

But! Since you can block using robots.txt, you do not really need to change any theme. Robots.txt blocks even better than the noindex meta, since the robot block will prevent GoogleBot from even trying to load those old posts.

If you still want to change your theme, there is a lot of documentation for theme creation and modification on Wordpress.org.

After finishing the robots.txt changes, you should go to your Google webmaster account and issue a removal request for the now blocked paths. Otherwise it can a long, long time before Google forgets.
Even with that, it will likely take a while.

federico2005




msg:4438341
 8:18 pm on Apr 7, 2012 (gmt 0)

Maybe I was not clear when talking about index.php. I meant to say the index.php file included into my theme directory (this file includes post loops etc...), not the index.php file included in the root.

Also a couple of things more.
1) this blog is a subdomain of my main site, its url is: myBlog.myMainSite.com
1) All of my old post were posted by the same author. That means I can post the new ones with another author ID. So that could give me the chance to set checks based on the author.
But unfortunately the following code snippet added above the <HEAD> tag to index.php file (of course the one included in my theme folder), tested for setting a 301 redirect, doesn't work.

<?php
if ( is_author('3') ) {
header("HTTP/1.1 301 Moved Permanently");
header("Location:
http://myBlog.example.com");
exit();
}
?>

am I missing anything?
P.S. I tested it for 301 redirect so I can see if that check works. If it were to work with 301 redirect, it could with NOINDEX too,with something like the following code:

<?
if ( is_author('3') ) {
print '<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />';
}
?>

DeeCee




msg:4438369
 11:07 pm on Apr 7, 2012 (gmt 0)

If your 'header' call actually gets executed, but fails, most of the time it will be because you try to issue an header AFTER you have already sent output into the output buffer. Some other output statements have already been executed. That will not work. Just send the redirect and nothing else to make the page move.

If the PHP logging level is set up to save the information that error will show in your logs.

Combining a redirect with pushing out meta headers have no meaning. The output sent after the 301 header is not used by Google. They just see the 301 redirect and move on. The ROBOTS meta header (or anything else you push out after it) has no meaning. So that is an either/or.

Either redirect all those pages to your home page (effectively eliminating them for everyone), or allow users to see them and push a NOINDEX to stop google from indexing. But in that case it is much better to block them in robots.txt as mentioned before, since that stops GoogleBot from even wasting time trying to load them at all. Save time both for Google and for your server. Otherwise Google has to reload all those pages over time, just to find out that they are not to be used when it sees the NOINDEX. Wasted effort.

With the robots block, users can see them, but GoogleBot cannot.

federico2005




msg:4438490
 11:51 am on Apr 8, 2012 (gmt 0)

Thanks for your help.
I had tried another solution that seems working (hopefully) before reading your last post.

As all of my old posts from autoblogging have a common feature, a custom field with the same key, I added in the header (of the file included in my theme folder) the following code:

<?php
$status = get_post_meta($post->ID, 'autoblog', true);
if ($status == '')
{ ?>
<?php } else { ?>
<meta name="robots" content="noindex,nofollow" />
<?php } ?>

with 'autoblog'=key of the common custom field.

It seems to have fixed the problem. If I look into the page source, I see for old posts

<meta name="robots" content="noindex,nofollow" />

with new posts not affected from that.

What do you think? Is this a solution ok, or should I add robots.txt too so to be sure google doesn't index it? (and where do I add that? In myTheme folder?)

Also, a piece of advice about it but a bit off-topic: as I get a significant traffic from this autoblogged section, with a lot of users coming from subscribing RSS even if they don't find it through SE, do you think its worth to go on with this autoblogging practice, or could it mess up my blog as whole (and my site domain) in terms of Google duplic.content penalties even after setting noindex?
Also would you advise to start a new blog for my new original posts (very important for my business) or going on with this blog taking advantage of its exposure (it's 6 y.o.) even if it suffered duplic.content penalties lately?

DeeCee




msg:4438505
 1:31 pm on Apr 8, 2012 (gmt 0)

As I mentioned, you can read about robots.txt and its purpose on robotstxt.org
It goes in the root of your site. It is not related to Wordpress at all.

What are you auto-blogging and where does that content come from?

federico2005




msg:4438544
 7:29 pm on Apr 8, 2012 (gmt 0)

it is about international business with the content coming from established international newspapers publishing in English language

DeeCee




msg:4438605
 1:07 am on Apr 9, 2012 (gmt 0)

Which makes it duplicate content, and usually invalid. On most of the millions of similar sites "published" or "auto-blogged" by tracking these real sources RSS feeds. Also called Content scraping and republishing of other people's content.

That is what Google has vowed to eradicate. Users doing search should be sent to the original sources, not to the aggregator or content scraper site.

Anyone duplicating content like that would run the danger of being detected by search engines and dropped from search entirely, and rightfully so, as they add no value.

rocknbil




msg:4438780
 4:05 pm on Apr 9, 2012 (gmt 0)

Why don't you just add a custom function in functions.php of your theme to return the nofollow meta if the post is older than the date you want? Usage would be something like

<meda description="whatever"/>
<?php echo nofollow_old('2009-01-10'); ?>
</head> <!-- or whatever follows -->

then the function would be something like (pseudo code, NOT WORKING CODE)

function nofollow_old($the_date) {
$nf = null;
// do your date math here
if (get_the_time() <= $the_date) {
$nf = '<meta name="robots" content="nofollow"/>';
}
return $nf;
}

Even better, instead of passing a specific date, you can pass "number of years" so at time marches on it constantly nofollow posts as they age. It shouldn't be too hard to figure out.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / WordPress
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved