Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Any Whitehat ways to hide content from GoogleBot?

         

php_dave

3:36 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



I have Googled this to no avail. Maybe you good folk can help me...

I have a series of 25 websites which are location specific and each one has its own members. However, I have coded a basic forum which is viewable on all of the sites. If someone on one site posts in the forum then that post appears on the forum on all 25 sites.

Of course, I know about Google and repeated content. I imagine that it won't be good for me to have lots and lots of content which appears exactly the same on 25 different websites. Google will think that I am harvesting content or something, or that none of the content is relevant or good quality.

But as you know, forum posts are content rich so blocking the forum from Google altogether would be a real waste.

Therefore I would like to make it so that all of the forum's content is available on Google somewhere, but only once. But I don't want to favour any one of the 25 sites over the others.

So... my solution is to hide from Googlebot the posts on WebsiteLondon which were made by members on WebsiteManchester and vice-versa. This would mean that all of the posts get found by Googlebot at some point (leading to either WebsiteManchester or WebsiteLondon when people find it in search results), but only once, and people viewing the forum on any of the websites will see all of the content at once.

Is there a way to block specific areas of content from Googlebot without getting blacklisted? Kind of like how you can block links with nofollow.

Thanks in advance, fellow web chums.

tedster

7:46 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a way to block specific areas of content from Googlebot without getting blacklisted? Kind of like how you can block links with nofollow.

No, there isn't. Lots of people have wished for this kind of orgnaic feature over many years - something like Adsense ad targetting offers, or the Google Search Appliance for internal site search functions. I assume that the potential for abuse in the organic area is just too big for Google to offer a feature like this.

You want to be very careful with BOTH these situations: 1. showing googlebot something different than you show human visitors or 2. duplicating content on 25 different domains. The first is cloaking, and a violation of Google's guidelines. The second can also look like spam to an algorithm.

Assuming these forums require a log-in, you CAN show something different to logged in members compared to what you shoe to all visitors who aren't logged in. Maybe that way you can serve both your forum members and your search rankings on Google. I like the idea of a geographic filter of some kind.

php_dave

7:58 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Thanks for your reply, Mr Ted.

The forums need to be able to be read by non-members too, so making it appear differently when people are logged in isn't an option, I'm afraid.

What about <!--googleoff: all--> ? Isn't that an official Google thing, to tell Googlebot to stop reading the content at that point and then <!--googleon: all--> to tell it to start again?

I've used that on my sites before and in the last couple of hours I've added it to my forum, to appear dynamically based on which site is being scanned and who the member is.

If a member posts something then Googlebot should only read it if GB is scanning the site that the member is part of. If it's scanning the forum of another site with the same content then it should ignore the post. This ensures that every post is read by GB (for maximum content to be entered into Google) but only on one site (whichever site the posting member belongs to), yet the forum can still appear in full to whoever is reading it, on whichever of the 25 sites they're on.

Have I made a big error somewhere or is that all just cRaZy enough to work?

deadsea

8:18 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We have used AJAX in situations like this. The content you want crawled should be put directly into the page source. The content that is duplicate from somewhere else should be loaded into the page via AJAX onload from a url that is restricted from crawling by robots.txt.

If a page doesn't have any content in it directly, it can look like a blank page to Googlebot so make sure to meta noindex these pages.

jimbeetle

8:26 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What about <!--googleoff: all--> ?

That only applies to the Google Search Appliance.

phranque

8:37 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



perhaps the common content should be put in an iframe.

php_dave

8:39 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Jimbeetle, please explain more! By using that Google tag, what exactly am I doing?

I'm fick n dunt get it :(

phranque

8:42 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the Google Search Appliance is a piece of hardware that essentially gives you google search technology for an enterprise's domain or for a corporate intranet.

it is not the Google Search that "The World" is using.

php_dave

8:47 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Okay, Phranque. But what's that got to do with anything? :s

aakk9999

8:57 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why not use cross-domain canonical link element? Choose one domain to be indexed and automatically output canonical link element to all forum pages/posts to point to the equivalent (the same) page/post on that "master" domain.

netmeg

9:02 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you NOINDEX 24 of 'em?

tedster

9:13 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, Phranque. But what's that got to do with anything?

The tagging you suggested is only functional for one use case - a Google Search Appliance. So that's not going to do anything for your organic ranking challenge. That's what it's got to do with this discussion.

php_dave

9:17 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Got ya. I didn't know that before and now I do.

php_dave

9:23 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Right, well I think that my best bet is to do as Netmeg said and just Noindex all the sites' forum except for one. I have an umbrella site which links out to the 24 others so any traffic that goes there from Google search results containing forum content will be in a good place to then go to the right site, depending on what they're looking for.

Thanks for your help, everyone! An e-high5 for each one of you.

tommytx

10:18 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



I just quickly read thru the posts above.. and hopefully I am not repeating a suggestion... but how about placing the comment in a frame on all the sites but the London site... we all know how google ignores any text in a frame... and give the frame no borders so it looks to be part of the rest. Then in the robot.txt you could say stay from that file just to make sure google does not accidentally come across it while wandering around...

php_dave

10:50 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Good idea, Tommy. However, due to the way I want to block content from Google I don't think would work.

I'll try and explain it using a super-low-tech diagram :P

Member 1 is from the London site. Members 2 and 3 are from the Manchester site.

************************************
WebsiteLondon
Thread: Blah Blah
Member 1: I think this.
Member 2: Yes, I agree.
Member 3: I disagree.
Member 1: How dare you. If I see you around I will cut you up.

WebsiteManchester
Thread: Blah Blah
Member 1: I think this.
Member 2: Yes, I agree.
Member 3: I disagree.
Member 1: How dare you. If I see you around I will cut you up.
************************************

So there's two websites with the exact same forum thread on them. If Googlebot sees this it will think that content has been duplicated, which is never good.

So what I wanted to do before was to make it so that Googlebot only sees the posts which were written by the members from that specific site.

So when GB scans WebsiteLondon it sees...

WebsiteLondon
Thread: Blah Blah
Member 1: I think this.
Member 1: How dare you. If I see you around I will cut you up.

And when it scans Manchester it sees...

WebsiteManchester
Thread: Blah Blah
Member 2: Yes, I agree.
Member 3: I disagree.

The fact that the posts don't make sense without each other is irrelevant. Only Googlebot sees them like that and Googlebot doesn't care about the thread as a whole, just about the large chunks of content (posts). This way every post is entered into Google just once, and because it isn't too important which site gets the traffic, just as long as one of them does (as they all link to each other and it's clear to a viewer how to find their relevant site) then this would be ideal. When a viewer lands on a site they see the whole forum and the thread makes sense because all the posts are there.

However, it would seem that blocking specific posts on specific sites can't be done. And using AJAX to fetch individual posts from external files will start getting very complicated, as will using Iframes to hold posts.

So I'm going to use a canonical to point the thread "Blah Blah" on 24 of the sites to the one central site's thread "Blah Blah". So Google will only count that central site has having the forum content on it.

When traffic arrives on that central site, from having found relevant content in a Google search, then they will be on my network and will be able to find the relevant site should they wish to.

Hoorah!

klark0

11:44 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Cross domain rel=canonical to one domain!

One domain will be indexed ..plus as I understand it all your link juices will be consolidated to one version of each thread.

deadsea

12:31 am on Sep 19, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Even if you did make it so that some of the content was hidden from Googlebot on each page, you would still have a duplicate content problem. The url structure would be the same across the sites and you would have duplicate title and meta descriptions at those same urls.

You should assign ownership of each THREAD to a specific site instead of each post. Whichever site started the thread gets to own all the replies, regardless of which sites the members that respond belong to.

That would allow you have content on all your sites, avoid duplicate content, and use page level meta tags for indexing that are easy to work with and well supported by Google.

php_dave

12:44 am on Sep 19, 2012 (gmt 0)

10+ Year Member



By Jove, that's a great idea!

Good thinking, DeadSea :)