homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Blogger Scraping My Content - Why?

 5:28 pm on Jun 14, 2012 (gmt 0)

So one of my clients noticed this blog scraping his content. The scraper blog is ranking higher than him when he searches for the title of the article.

I get what he's doing, but don't understand why. All of the outbound links go to forum profiles all over the web. The scraper doesn't seem to be selling any links or directing the link juice very efficiently.

Anyone seen this before? What's the endgame here?

[edited by: tedster at 1:38 am (utc) on Jun 15, 2012]



 2:08 am on Jun 15, 2012 (gmt 0)

You are not seeing anything near the endgame, in my experience. This sounds like a technique used to build a large backlink pyramid that can fly under Google's radar for at least a few months. The payoff can be much further down the line, and may not even be built yet.

However, for your client the big issue is that he is being outranked for his own content - he does not have enough trust or authority to maintain his rankings for content he created. Unfortunately, this can happen quite easily right now, especially with Google's emphasis on freshness.

I worked with one client who was in the same boat - and the move that fixed it involved using PubSubHubbub {PuSH) to send a "fat ping" to Google at the instant new content was posted by my client. This establish HIS authorship rather soundly. In addition, my client had an RSS feed which we put on a 1-hour delay, just for insurance.

Reference: https://code.google.com/p/pubsubhubbub/


 3:51 am on Jun 15, 2012 (gmt 0)

I worked with one client who was in the same boat - and the move that fixed it involved using PubSubHubbub {PuSH) to send a "fat ping" to Google at the instant new content was posted by my client. This establish HIS authorship rather soundly. In addition, my client had an RSS feed which we put on a 1-hour delay, just for insurance.

Reference: https://code.google.com/p/pubsubhubbub/

Is this the same as what wordpress plugins like Google XML Sitemaps does?

When you post it generates a sitemap and notifies Google. I am just not sure if it uses pubsubhubbub


 4:11 am on Jun 15, 2012 (gmt 0)

It is just something like that - but the regular Sitemap ping is not the same in terms of effectiveness. The details are in that reference link.


 11:06 am on Jun 24, 2012 (gmt 0)

I can't find any technical tutorial on fat ping or PubSubHubbub. Every article and tutorial explains what generally it is and links to the Wordpress plug-in. Or tells not to worry since Blogger, Feedburner etc. already use it.

I'm going to implement it using PHP, and already found the publisher client for it. I've also downloaded the Wordpress plugin to check the code.

Here is what I don't get; the PHP library and the Wordpress plugin just send the url of the Atom feed (Wordpress plugin sends multiple feeds) to the hub. They don't send new urls or any other content to the hub. Then why is it called a fat ping?

Tedster, I've also have a hour delay in my feed. Did you implement a separate Atom feed without a delay, and use that to ping the hub? And I guess it should be a full feed (instead of a partial feed with just first couple of paragraphs)


 2:27 pm on Jun 24, 2012 (gmt 0)

The phrase "fat ping" is another way of saying "full feed", yes.

WRT to the rest of it, I'm not the actual developer. I just explained what I wanted and he did it. Unfortunately he's moved on to other things and I cannot reach him about further details.


 8:28 pm on Jun 24, 2012 (gmt 0)

Thank you tedster. Sending the url of the feed is actually a light ping. After searching for hours, only clue I can find is in a help pop-up in superfeedr.

The standard PubSubHubbub protocol specifies that the publisher (you) does light pings to the hub, because the origin of these pings cannot be verified. When we get a light ping, we will then poll your feed to identify what's new vs. what is old.

However, this polling can become pretty expensive if you have tens of thousands of feeds and more. In this case, you can start to perform fat pings.

You will use the same syntax as light pings, but add 2 additional parameters :

hub.content : the content of the feed, including only the new entry(ies). We will directly parse this content, rather than poll the feed.
hub.signature : this is an HMAC signature computed with the secret shown below, and the hub.content. This will allow us to know that this content is coming from you and wasn't forged by a 3rd party.


 9:12 pm on Jun 24, 2012 (gmt 0)

Since it's a problem you can also add author meta data to the site and, if you're so inclined, claim the content with Google. It will be hard to outrank you with identical pages if Google believes you are the author. google authorship link - [google.com...]

Another possibility as to "why?" - to make your site look like it's taking part in shady link schemes and content scraping. Perhaps the scraper has a solidly ranking site in the same field and is attempting to undermine you?


 1:53 am on Jun 25, 2012 (gmt 0)

A quick tutorial for fellow webmasters;

Current implementation of pubsubhubbub (at least for webmasters) is just light pinging the server with the url of your Atom feed. If there is a new content, the pubsubhubbub server fat pings your subscribers with that content.

First step, enable realtime for your feed subscribers by adding the hub url to your feed.

<link rel='self' type='application/atom+xml' href='yourfeedurl'/>
<link rel='hub' href='http://pubsubhubbub.appspot.com/'/>

Second, go to [pubsubhubbub.appspot.com...] and publish your atom feed. Don't try the diagnostics part yet, you have to have at least one realtime subscriber.

Third, Google reader supports the pubsubhubbub and is a great way to do a quick test. Go to Google reader and subscribe to your Atom feed. (If you're already subscribed, you may want to unsubscribe and resubscribe)

Next, time to test it. Update your feed (add a new entry) and then go to [pubsubhubbub.appspot.com...] and publish your feed again (ping). Go back to Google reader, your feed should be updated in a couple of seconds. (Google reader page doesn't auto-update, click 'Home' link on left column to check updates)

Now, you can go back to [pubsubhubbub.appspot.com...] and run diagnostics

Implementing an automation for pings just requires posting your Atom feed url to the pubsubhubbub server. You don't need an extra library, a basic CURL would do. You just have to post 'hub.mode=publish' and 'hub.url=your-atom-feed-url' to 'http://pubsubhubbub.appspot.com/'

$params = "hub.mode=publish&hub.url=".urlencode("http://www.example.com/atomfeed");
$ch = curl_init();
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$params);
curl_setopt($ch, CURLOPT_URL,'http://pubsubhubbub.appspot.com/');
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_exec ($ch);
curl_close ($ch);

Last but not least, don't forget to add your Atom feed to WMT Sitemaps. As far as I can tell, when you add it to WMT, Google or Googlebot subscribes to realtime updates through the pubsubhubbub server.


 6:19 am on Jun 25, 2012 (gmt 0)

Actually Google frowns quite deeply on someone scraping your content without your permission. All you need to do is to send the webmaster a take down notice. I've done it a couple times and it works quite well.

Give them 72 hours to remove it. If they do, fine, if not go to the Google Bad SEO/Spam form and very simply turn them in for stealing content to promote their site.

I'm a writer. I had one client refuse to pay for 25 articles and not accept my emails. I explained to him that since it was not paid, it was not his and he must remove it. Stolen content--something that you own and can prove by virtue of having it online first, or having the purchase order for it, is no different than stealing images or anything else. If you did not give him permission to use it, he has to remove it. It worked for me.

Cutts recommends a few steps to take in an old blog post located here:


Use the DMCA request to make one who stole your content remove the content
Also submit a spam report
and if they are using Adsense. you're going to get faster action if you report them for an adsense violation.
Happy Monday!

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved