Republished (scraped) Pages Doing Better than My Originals

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Republished (scraped) Pages Doing Better than My Originals

j0sh

10:01 am on Sep 26, 2007 (gmt 0)

Hi all
our site generated 20.000 visitors daily until some months ago

when other sites started to republish our rss, google penalizes our site probably because see our articles as duplicates of others republished pages...

now if i search an our article title our page doesn't appear in serp but all other sites that republish our articles are in firsts positions...

we're thinking to eliminate feed...

what do you think?

tedster

4:37 pm on Sep 27, 2007 (gmt 0)

Many people here have noticed this problem. Google spokespeople said, at one time, that only sites already "weakened" by some kind of penalty would have this kind of problem - I'm not sure that's what we're seeing today.

Of course scrapers don't need an RSS feed to grab your content, but it does make things easy for them. If your feed has essentiasl value for your business, then dropping it (or suspending it for a period) might be an experiment worth trying.

But how sad to even think of this - dropping web technology just as it becomes mainstream. I would have hoped that the date of the feed would HELP Google acknowledge your site as the original.

iridiax

6:17 pm on Sep 27, 2007 (gmt 0)

Try switching to a partial feed (excerpts only). Scrapers find these much less attractive, and even if scraped, partial feeds are unlikely to hurt your Google rankings. Many readers prefer full feeds, but partial feeds are better than none at all.

jk3210

11:28 pm on Sep 27, 2007 (gmt 0)

I believe you'll find that Google cares only about returning the "best results" for a particular search based on their usual algo criteria, NOT based on ownership of the content. Look at the way GoogleNews handles articles from AP, Reuters, etc. Those sites aren't always number one for their own articles.

Another good example is fan sites. Some of them scrap the full content of every article they find and since they are often high PR authorities on a particular celeb, they can scrap at will; get it spidered before the originator; and get full credit for someone elses content.

And since the fan sites are usually running Adsense, from Google's point of view, there's no downside to returning a scraped page. Plus, if you think about it, how WOULD Google determine "ownership?" In the age of RSS, there is no such thing as "ownership" that can be effectively policed.

When people put an RSS feed on their sites, they're hanging a sign on it that says "Please Steal Our Stuff." So why are people surprised when it happens?

"But our TOS requires a link back to the originator," they often say. Well, it isn't Google's job to police the internet based on THEIR TOS.

we're thinking to eliminate feed

That's what we did.<G>

JS_Harris

11:44 pm on Sep 27, 2007 (gmt 0)

I'd recommend going with full feed through a major feed burning service. You need to poison the feed a little to detract would be scrapers however. Thats easily accomplished by text linking each article to other relevant articles on your site.

Scrapers want 'autopilot', if you link to other related articles from many different places inside each article Google will love you and scrapers will loathe you, hopefully enough to ditch your content. Even if they don't, it's not hard to know who scraped who so they risk their site.

Many feed burning services also offer tracking, charts, security etc. When combined with pingbacks (set up a blank page on your site to display those if you are so inclined) it's hard for scrapers to get any value from a feed for any length of time. If you use these suggestions prepare a boilerplate 'cease and desist' email and add google's copyright abuse reporting link to your favorites.

nippi

12:42 am on Sep 28, 2007 (gmt 0)

Yes, we are no longer feeders, its not possible for us to police removal of our link back.

of course scrapers are going to remove it

johnblack

1:32 am on Sep 28, 2007 (gmt 0)

When people put an RSS feed on their sites, they're hanging a sign on it that says "Please Steal Our Stuff." So why are people surprised when it happens?

Not sure what you mean by this? You could equally say that about scraping content from web pages as well.

RSS feeds are part of a website and covered by any copyright mentioned on the site.

RandomDot

2:29 am on Sep 28, 2007 (gmt 0)

The following is the advice I could just throw out my head based on the number of your user base - there's other models for other websites with different user bases than yours - for whom it can be an advantage with the rss scrapers in order to drive traffic to their overlooked site.. just another story.

RSS is a way of presenting the more or less important news to your user base, not a way for other people to be smart with your contents when you've already got the user base to cover yourself - don't give them the opportunity - unless you benefit from it. (benefit can be money, an even larger user base, wider audience, branding, whatever it is)

- keep it simple, just make a link and a headline - always include your domain name in the headline in the feed - don't make the headline too obvious of the content, make it intriguing, not descriptive/keyword rich. Drop the description - don't give people content before they're on your site.

- keep the feed at five or ten items or so - it's not a sitemap, a reinvention of the Gutenberg press or anything similar - it's a feed. It's there to draw attention to your site, and keep people up to date and give them a reminder you still exist and most importantly, make them as addicted as possible to your brand, nobody elses..

And to counter all of the above arguments of what you just should do, always think a little ahead - if your article/contents were to drop in the se's - would the scrapers still rank higher, and more importantly - would they still drive traffic to your site, even though your contents is at position #345676 in the results? -