Forum Moderators: Robert Charlton & goodroi
I know the proper way to tell Googlebot to remove pages or whole directories is to use robots.txt . But my URL structure is simply too complicated for that. I'm contemplating something more radical than that: what I call reverse cloaking. That is, I want to hide content from Googlebot.
What I'd do is mark in my database which pages I want Googlebot (and all the other bots) to see. If I detect a bot, I'll dynamically generate links only to pages that are "crawable," otherwise I'll cloak them out. And, from now on I'll generate a 410 error whenever a bot crawls a non-crawable page.
Main question: does this scheme violate Google's TOS? I know the opposite - showing Googlebot more than I show regular web viewers, "cloaking," is a no-no, but what about the opposite?
I am pondering re-structing my URLs completely and starting over, to try to avoid all these duplicate URLs, but it's not clear yet how do to this, plus even then I'm going to take a hit for a while until Googlebot catches up. Is there a Best-Known Method to do this properly?
Many thanks for your insights.
Another thing to keep in mind if you're trying to get PR to flow is that Google will still know the link is there, even if it can't crawl it. Most of the PR conserving schemes that I've seen make use of Javascript or other technology to hide the link from the bot.
Again, does anyone have any thought as to whether Google will not like having links excluded from it? Does it violate TOS somehow?
<img src="img.gif" onClick="window.open('http://www.domain.com','_self');" style="cursor:pointer">
You can also use swap image javascript in an image tag too without needing <a href="">
<img src="img.gif" onClick="window.open('http://www.domain.com','_self');" style="cursor:pointer">
I assume
<span onClick="window.open('http://www.domain.com','_self');" style="cursor:pointer">Click Me</span>
would work similarly without bother of image.
fishfinger: Interesting observation on document.write and analytics.