Forum Moderators: rogerd & travelin cat

Message Too Old, No Replies

Blocking Googlebots Only on a WP Site

index,google,googlebots,noindex,wp,wordpress

         

Vantelli

6:04 pm on Oct 16, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hey guys, I would like to deindex my website in Google. I know how to stop all crawlers to go around my site, but I would like to keep my site indexed in Bing and other SE. I just don't want my site to be indexed by Google.

I know I should use <meta name="googlebot" content="noindex"> for particular posts/pages but just want to check with you what's the best way.

Should I place <meta name="googlebot" content="noindex"> somewhere in my theme's header, or should I block googlebot in the robots.txt or...?

Just please let me know what's the easiest and the smartest way to block Googlebots to crawl and index a WP site.

Thanks!

not2easy

7:03 pm on Oct 16, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Easiest to block crawling in robots.txt but that does not stop indexing. IF the pages are already indexed and you want them to see any noindex tags, then do not use robots.txt for this as they need to be allowed to crawl to see the noindex tags. Once the pages are not indexed then you can use the disallow in robots.txt but you may want to keep those metatags in place even then. If google finds links on other sites to the content you are trying to noindex and if those links are not nofollow links, they will crawl.

The easiest way I know of to add the metatags sitewide is to make that setting the default for your site using a plugin such as Yoast's SEO plugin

Google offers more specific information on using this noindex method here: [developers.google.com...]

Vantelli

7:23 pm on Oct 16, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks for the info and the link.

Content is now indexed by Google, so I can't just disallow it in robots.txt, just as you said. I need somehow to place <meta name="googlebot" content="noindex"> in the <head>.

Not sure if I can just edit my theme and paste it there?

The Yoast SEO plugin is installed, but not sure if and how I can use it for this purpose...

Vantelli

7:29 pm on Oct 16, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



This is what I have in header.php:

<head>
<meta charset="<?php bloginfo( 'charset' ); ?>">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="profile" href="http://gmpg.org/xfn/11">
<link rel="pingback" href="<?php bloginfo( 'pingback_url' ); ?>">

<?php wp_head(); ?>
<meta name="googlebot" content="noindex">
</head>

If I simply paste <meta name="googlebot" content="noindex"> there, will it work?

not2easy

8:30 pm on Oct 16, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You would find those settings in the Yoast interface. It might help you find them if you edit a page or post, and click to view the metadata settings. There it shows you what the current defaults are and I believe there is a link to change the defaults sitewide right there. Otherwise you can go to the Plugins folder and scroll to that plugin for the links to settings. The links are also in the Admin sidebar.

It doesn't help if the header.php is edited and the Yoast plugin is adding a conflicting metatag dynamically. Use "View Source" on a live page and you should see what that means.

Vantelli

8:39 pm on Oct 16, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



With Yoast plugin I can only noindex posts/pages for all search engines, but I want to deindex my site in Google only. I would like it still to appears in Bing,Yahoo etc.

not2easy

9:33 pm on Oct 16, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Well that is a dilemma. It looks like you can't use Yoast to manage it in that case. I'd never looked at Yoast's plugin with that objective in mind.

I haven't asked why you wanted to do this but it seems you might be creating a problem somewhere down the line. You might have to try to get any inbound links to be nofollow or they could be followed on Google's crawl of their source - then if blocked in robots.txt you might find them indexed again. I suppose you could always 403 all requests from Googlebot's IPs but I don't think that would benefit you.

Robert Charlton

3:32 am on Oct 17, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There are two current threads in Google SEO News that should explain why Vantelli is doing this. I'll let him decide whether he wants to link to those threads from here. I recommend that he does.

Vantelli... an observation that if you're doing what I think you're doing, you had best also use "nofollow" in your Googlebot meta robots tag, as in....

<meta name="googlebot" content="noindex, nofollow">

If you don't, you may inadvertently be sending the wrong signals to Google, and Google might further penalize the linking from this site/these sites to your other site(s)... as it would look like you're trying to hide your link sources from them.

Perceived intention is a big part of the Google algorithm... and you have to be very careful, in your current situation, not to send the wrong signals. Probably further discussion on the Google algo aspect of this should continue on the currently active thread... but I chanced to see this and thought you should know the potential for further problems.

Vantelli

9:06 pm on Oct 17, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks for the reply, Robert.

Yes, I can explain why I want to do this - my sites are hit by a manual action and I lost my rankings in Google. Its about duplicate content and the first idea was to delete all sites but one, and to move all my content to that one site and to fix it. Then I checked my traffic and noticed decent amount of traffic from Bing, referral sites, social sites... I said to myself "wait a minute. Why to destroy all these sites and to lost all that traffic just because one of SE penalized you?"

Now I want to noidex sites, but for Google only. If they don't like my sites, that's okay. They will not have access to them anymore. Other SE doesn't think that my content is gibberish.

When I de-index all my sites but one in Google, maybe I'll be able to fix duplicate content issue. The first thing is to figure out how to do that.

Yes Robert, you're right. I should use: <meta name="googlebot" content="noindex, nofollow"> So, I should just somehow but that in <head> and it will work?

Thanks

Robert Charlton

9:44 pm on Oct 17, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Vantelli, yes... that's exactly what I thought was going on.

The algorithmic discussion belongs in the Google forum, though, so post background information in this same thread [webmasterworld.com...] ...and ultimately we can catch up on that. The Bing traffic, I think, is also a bit of a clue about what your Google problems are.

Vantelli

8:16 pm on Oct 19, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



There is a plugin "Add Meta Tags" which helps you to have control over all meta tags on a WP site. I've just added <meta name="googlebot" content="noindex, nofollow"> in the <head>. Let's see what happens.

not2easy

10:21 pm on Oct 19, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Be sure to disable the Yoast handling of metatags or you will have conflicting instructions. Use "View Source" to see the header's metatags on a live page or post.

I should add that this isn't likely to do what you would like it to do.

Vantelli

2:16 pm on Oct 27, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



It works for one site. It's deindexed in Google, but still indexed in Bing, Yahoo etc. I have done the same thing with other sites, but all posts on them are still indexed. Why everything have to be so complicated? :S

Vantelli

10:46 am on Oct 29, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



My current robots.txt is:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

What I should change to stop goglebots to index my pages? I want to deindex my sites with Search Console ("Remove URLs") and then to block Googlebots in robots.txt to index my posts again. Do you think this could work?

Vantelli

2:16 pm on Oct 29, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



OKay, I've managed to remove my sites from Google index, now just need to find out how to stop them to index my content again. I've changed robots.txt to:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

User-agent: Googlebot
Disallow: /

Is that okay? Or maybe I should use something like:

User-agent: Googlebot
Disallow: /

User-agent: bingbot
Disallow:

User-agent: YandexBot
Disallow:

Do I have to write a line for each search engine crawler if I want to permit all of them but Google?

not2easy

5:59 pm on Oct 29, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Have you made sure that all your content has currently been removed from Google's index? If so, you can use:
User-agent: *
Disallow:
User-agent: Googlebot
Disallow: /
to allow all robots except google.

There will be side effects if any of the content is linked to from other sites unless those links are nofollow. If they find links and are disallowed from crawling, those pages may be indexed with no description (with a google disclaimer description). Remember that other sites may link to content they have found on Bing or other SEs and in that case, Google will follow and crawl those links and possibly index those pages despite your robots.txt file.

Vantelli

7:17 pm on Oct 29, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thank you very much for the reply. Right now I'm using:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

User-agent: Googlebot
Disallow: /

That's what I have in robots.txt right now. Is this correct? Can I leave it just like that?

Yes, I de-indexed all posts and pages in Search Console. If you want to de-index the whole website quickly just type / in the field for page removal request and it will be removed from Google index in a few hours.

dougwilson

4:21 pm on Nov 14, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Just curious as to what type of "duplicate content". I've got messages in GWT's about duplicate descriptions. But never content

Vantelli

4:44 pm on Nov 14, 2016 (gmt 0)

5+ Year Member Top Contributors Of The Month



It's "thin content" actually, sorry for misunderstanding.

dougwilson

7:01 pm on Nov 14, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ah, got it