this will make it more difficult to block scrapers from getting access to your content and is one more example of google's pattern of disintermediation.
all your content are belong to us...
OK, let's get practical: CAN THIS BE BLOCKED?
It says website owners can opt-out, which means Google has again, opted sites in by default.
Sure you can opt out- according to google by blocking google in robots.txt or adding noarchive to every last page in your site. good grief.
In other words, if you want Google to spider your site, you effectively must "opt-in" to this.
While normally a loyal Google supported, I must say that I am disturbed by this new "feature."
Well.. It saves me from creating a RSS feed. And as all pages are already in their index for everyone to see and read in Google.. Its just another way of publishing pages and follow changes on those pages.
Looks like the trick is to provide a RSS feed, and to limit how much content is displayed.
this is just another good example of Google trying to keep users on their own site, instead of visiting our webpages.
...maybe the user visited our site once or twice a week to check up, but now they can just use google reader instead.
google already allow adsense ads to be placed inside feeds hosted by feedburner. i wonder whether ads will appear in these as well.
|Looks like the trick is to provide a RSS feed, and to limit how much content is displayed. |
Yeah, this looks like the best solution. At least for now, Google Reader only offers the "track this page" if it doesn't find a valid <link type="application/rss+xml" (or whatever) in the source code of that page.
So, if you have a blog anyway or any other RSS data stream, just add the <link> to that in the head of every page of your site.
Of course, eventually Google will just give users a different means of accessing this feature. :/
Easiest solution, add the below to every page of your website:
<link rel="alternate" type="application/rss+xml" href="http://www.yoursite.com/some.rss" title="Your News">
Google reader won't let you add a page that has that in the HTML, it will just grab the feed.
The opt-out page says you can opt out by blocking Googlebot in robots.txt (Yeah, right. Watch everybody rush to do that ), or add a <meta name="googlebot" content="noarchive"> tag to pages you don't want included (i.e. all of them). I already have <meta name="robots" content="noarchive"> on all my pages, which appears to be honoured for the SERPs so why does this need a Google specific tag?
The robots tag may be sufficient. I tried adding the homepage of one of my sites and got
|Google was not able to access this page to check for updates. This page may be unavailable or have other restrictions that prevent Google from getting updates. |
Google doesn't fully search every page they visit on the first pass and then only scans for fresh materials on subsequent passes so it's not a surprise that RSS isn't needed to fuel this service.
It does make their feedburner purchase a little redundant however.
Can you opt-out with meta robots noindex? It doesn't specifically say.
|Sure you can opt out- according to google by blocking google in robots.txt or adding noarchive to every last page in your site. good grief. |
Adding the NoArchive would be my very first and only choice. We've been using NoArchive for years now across all sites. You can utilize the NoArchive Meta or serve an X-Robots-Tag in the server headers globally. :)
Gee, I wonder if it will pick up price changes on ecom pages?
|We've been using NoArchive for years now across all sites. |
Any negative impact on rankings?
Thanks for the heads up, NoArchive goes on all important pages ASAP.
I'm very new to this RSS stuff. What is this about NoArchive and how does one implement it in a web site?
The NoArchive meta tag prevents Google from showing the Cached link for a page. We've had several threads about NoArchive [bing.com]. You can pick up some good information there.
Thanks. I'll check those out.