Forum Moderators: Robert Charlton & goodroi
Questioning the wisdom of using fat pings to deal with scrapers
Really? Wow! That's news to me ...
Google Inc. has temporarily shut down a search engine feature that allows users to find real-time updates from Twitter, Facebook, FriendFeed and other social networking sites.
A message posted early Monday on Twitter by the team behind Google Realtime says the search feature has been temporarily disabled while Google explores how to incorporate its recently launched Google+ project into the feature. The tweet tells readers to "stay tuned
Turns out, Google's Twitter deal expired.
So, I'm [slowly] trying to wrap my head around the concept of SubPubHubbub, and the only reason I looked at a protocol with the name like this is because tedster recommended it
Tedster has laid out approaches he's been trying in several posts that might be helpful to you. I'll link to a couple of threads here, with some excerpted comments from each....
How do we tell Google we were wrongly Pandalized?
http://www.webmasterworld.com/google/4387503.htm [webmasterworld.com]
From tedster's Nov 22, 2011 posts...I want to emphasize that my ideas about correlation between widespread syndication and being wrongly Pandalyzed are my own conjecture, nothing proven and nothing officially communicated. It's just what seems to make the most sense for the cases that have me scratching my head....
...What I'm trying with one site is to ramp up every "we are the canonical source" signal I can muster, including authorship tagging, pubsubhubbub, delayed RSS, no more full RSS feeds, etc, etc. I'll let the forum know if it works.There's evidence that Google "wants to" credit the original source in the SERPs, but many times a more authoritative source who is quoting in full or syndicating (even with full acknowledgement) will still rank higher.
And, on Jan 15, 2012...
Article pages not ranking since Panda 1.0
http://www.webmasterworld.com/google/4406778.htm [webmasterworld.com]
>>my articles get picked up by other sites<<
I recently worked with a site that had a similar issue. We made a couple of changes that seemed to improve indexing and ranking immediately.
1. Inaugurated authorship mark-up
2. Used pubsubhubbub (PuSH) to send Google "fat pings" immediately at publication
3. Delayed the standard feed until the PuSH feed was received
Please take a look at the threads and provide more relevant details about what you've done, what type of site is ranking in your place, the feed situation, and the timeline regarding when this happened.
if you ping the hub (any open hub) with a full RSS feed, then THAT (full RSS) is what gets pushed to your subscribers, and one of those can be Google but can also be your scraper.
I just add the full feed to Google Reader (for debugging) and Google Webmaster Tools (as sitemap)Why would you add the full RSS as a sitemap (and risk that I just might be discovered by scrapers, too) if Google only reads URLs from sitemaps?
Why would you add the full RSS as a sitemap (and risk that I just might be discovered by scrapers, too) if Google only reads URLs from sitemaps?
Why would you add the full RSS as a sitemap (and risk that I just might be discovered by scrapers, too) if Google only reads URLs from sitemaps?
<FilesMatch "\.rss$">
Header set X-Robots-Tag "noarchive, nosnippet"
Order allow,deny
#allow google IPs
allow 64.18.0.0/20 64.233.160.0/19 66.102.0.0/20 66.249.80.0/20
allow 72.14.192.0/18 74.125.0.0/16 173.194.0.0/16
allow 207.126.144.0/20 209.85.128.0/17 216.239.32.0/19
deny all
</FilesMatch>
Is there an equivalent for getting googlebot IPs
You can then use logs to catch scrapers since they'll be the ones hitting up your pages fast and furious
As for RSS feeds, it's best not to have any for the majority of sites.
As in almost all pubsub systems, the publisher is unaware of the subscribers, if any.