|Site scraped my RSS feed, which has hijacked my site. What to do?|
I wouldn't consider posting such a question, but I feel this is important enough to do so. This could be you which is why I will mention this situation.
I have a site which all but disappeared from Google. Organic traffic evaporated from Google, yet remained in Bing with no issues.
What happened is that my RSS feed is being broadcast elsewhere. That's nothing except for the fact that the site broadcasting my feed ate my site. That means Google considers them the authority, the author, etc. Essentially I've been removed completely from my own content and written articles. Think for a moment. Your website gets eaten. You can copy some of your text and your site will not show, but they will. You need to click on the "show duplicate content" which Google displays in italics at the end of that particular search.
Every tool at my disposal (okay a couple might be left to try) has failed. I can't express here my frustration at this situation. Okay, so a Wordpress RSS feed is a pretty common scrape situation. I get that. I also realize I've since disabled my feed completely. I also realize that sites who are being scraped via RSS do not get eaten alive as this situation has done. Perhaps it's because I didn't see the issue for months is why Google has decided that the site broadcasting my rss feed is the authority. I can't grasp this situation.
I'm baffled at this. Google rejected DMCA complaints. Scraper reporting no effect. Nothing. Because this is a RSS feed, and they may be using a cached version or whatever, this isn't stealing? They have Adsense running. How can I complain via Adsense about copyright infringing when Google said there is no problem?
I would say if I chose to include your site in this vortex, I might be able to get your content and site eaten alive and wiped away. This isn't "normal" and I've had scraped content etc. This is a situation that apparently has NO resolve. I don't know what to do which is why I'm grasping for suggestions.
You can't get information on this site because searches just result in articles that they are broadcasting via other site rss. I see this as a real tool for messing with your competitors.
Any thoughts on this are appreciated (of course if this makes it to the forum). Thank you.
MrSavage - I took the liberty of changing your title, to reflect that this is an RSS feed issue. I think that there's a lot that can go wrong with syndication, and it's more than a PageRank/authority issue. There are a bunch of reasons, eg, why Google might misunderstand who published first, and IMO, the tools that can be used to claim first publication might be used against you if you don't use them first. Overall site quality issues might affect you as well.
|Every tool at my disposal (okay a couple might be left to try) has failed. |
You don't say when this happened or what tools you've tried. Tedster has laid out approaches he's been trying in several posts that might be helpful to you. I'll link to a couple of threads here, with some excerpted comments from each....
How do we tell Google we were wrongly Pandalized?
From tedster's Nov 22, 2011 posts...
|I want to emphasize that my ideas about correlation between widespread syndication and being wrongly Pandalyzed are my own conjecture, nothing proven and nothing officially communicated. It's just what seems to make the most sense for the cases that have me scratching my head.... |
...What I'm trying with one site is to ramp up every "we are the canonical source" signal I can muster, including authorship tagging, pubsubhubbub, delayed RSS, no more full RSS feeds, etc, etc. I'll let the forum know if it works.
|There's evidence that Google "wants to" credit the original source in the SERPs, but many times a more authoritative source who is quoting in full or syndicating (even with full acknowledgement) will still rank higher. |
And, on Jan 15, 2012...
Article pages not ranking since Panda 1.0
|>>my articles get picked up by other sites<< |
I recently worked with a site that had a similar issue. We made a couple of changes that seemed to improve indexing and ranking immediately.
1. Inaugurated authorship mark-up
2. Used pubsubhubbub (PuSH) to send Google "fat pings" immediately at publication
3. Delayed the standard feed until the PuSH feed was received
Please take a look at the threads and provide more relevant details about what you've done, what type of site is ranking in your place, the feed situation, and the timeline regarding when this happened.
I gather that there's a wildcard mashup aspect to the scraping site, which apparently reacts to your searches and spits out pages in response to your searches. The sites I've seen create subdomains for specific searches. Haven't seen one in a while.
This is a black hole situation, or at least worst case is a black hole situation. How did I go from a site publishing my rss feed to suddenly become "that site" for "my content"? Perhaps it sat there too long. I wonder how my feed ended up on their site in the first place. What if they added 10% of all wordpress feeds to their site.
If you build a site that has one purpose and that is to publish other sites RSS feeds, what is that called? If that rss provider site runs Adsense ads, then shows your content, then what is that?
What I wonder is if because Wordpress comes with a feed by default, that gives people permission to publish without risk of DMCA? Should this be a word of warning to all people using a Wordpress type blog with imbedded feeds?
I suppose part of this is, what if you file a DMCA complaint and it gets rejected. All I've said via a complaint is that it's my content, I don't want it published elsewhere, that I've tried contacting the site, and that I have no other recourse in dealing with this. They stole my site, gobbled it up, and it's in a black hole now and Google sees them and not me. This was my biggest money making blog last year. It's not Panda imo, it's just that in Google's eyes, I'm the scraper and they are the originator.
I think I mentioned the fact that if you look for this site in Google, like searches about getting your feed removed from their site or trying to search if other people have issues with this site, all you get is their rss feed content which is from other sites. You can't get info about the site. Basically I can't see anyone else complaining like I am! I can't find any articles, references, nothing.
Update: I'm a jackass. I used Bing just now and yes, I can actually find other complaints about this site. Interesting. This seems like a critical flaw in some aspect of the algo. To clarify, I mean in the sense that original content can fall victim to such a black hole as this without any apparent recourse from within Google to set the record straight. Life sentence without a trial.
[edited by: Robert_Charlton at 7:16 pm (utc) on Nov 10, 2012]
Wordpress has plugins that will publish any RSS feed as posts, given the URL, very customizable. WP also has "SEO" plugins that let you customize your feed to defeat sites that might be publishing your feed. Basic settings in WP let you limit your feed to snippets with links to the full post/page. You can set custom headers, footers and links in every item in your feed. At a minimum I would try to use some of these tools to see how they might help.
Also see Duplicate content exploit in Google [webmasterworld.com] and be sure to click through to the Moz article.
What happened is that my RSS feed is being broadcast elsewhere.
Lets get this straight - what you are saying is that:
1. You are broadcasting and RSS feed
2. A N other site is takeing your feed, making a copy and saving it as a .rss file on their server.
That is a copyright issue.
Of course do you mean:
1. You have chosen to use some software that by default publishes what you write in an rss feed
2. You haven't changed the defaults
3. Another site is dynamically displaying the content of your .rss file
As long as the feed is being displayed dynamically that is legit. Its what rss is there for.
MrSavage, again: You don't say when this happened or what tools you've tried.
I have many multiple scraper reports via Google docs. Nothing. I have recently done DMCA complaints, those were all rejected. With the DMCA being denied, so that means and eliminates letters to a host and a complaint submitted via their Adsense ads. Correct? I can't find resource on "What do you do if you DMCA complaint fails".
This is a framing issue. Your site (rss content) in their frame, and your site content is owned by them according to Google. I do not know at what point or how long my content was on their site before ownership was triggered to pass to them.
Although there are some posted solutions out there, I don't see those working. What I think I see is that once your content is fetched, it's in the black hole. You can block etc, but I'm not sure yet if the band aid solutions will resolve anything in terms of ranking or content ownership.
So, in summary, let me ask you. Your content is being broadcast by another site, who took ownership (complete rankings) of that content in Google's eyes, has all your rankings, and you get no traffic from Google anymore. I'm sure there is a penalty because they are essentially naming you as a scraper of their content. You've tried the "legal" and normal options for dealing with copyright infringement, yet those WERE ALL DENIED. Tell me where you go from there. Please do, I'm all ears.
I have a strong feeling I know of the specific "feed" site causing you problems. They place your site in a frame on their url if I'm not mistaken along with a toolbar across the top of your page.
If that's the case kill the feed and file another DMCA, you shouldn't lose to a site which frames your content.
Why does a failed DMCA complaint to Google mean you can't file one with the host too?
The reason I say that is because I think Google would be the first to acknowledge a request and if that can't work there, then I don't see anyone else listening. Further, if I take other people's experiences, they have said contacting the host resulting in no action being taken. So I suppose I'm taking into consideration other peoples experience on this one regarding the host.
I'm a real newb at DMCA to be honest. My brain gets a bit fuzzy because it's strange how a frame references my site and how to incorporate that into the complaint. I didn't think it was complicated to file, but perhaps I need to look at this again.
Unfortunately this situation sucks the energy right out of me. What I can say is that many webmasters haven't exactly figured out where their traffic went and what role this site actually played into the demise. I'm currently trying to reach out to other people that I've seen posting about this. It does appear that there are a number of posts from September 2012 regarding this. I will say with 80%-90% of people using Google to try and troubleshoot this means they won't actually get results that talk about this issue. Sounds bizzare but you really won't have much luck finding articles/posts with people discussing this site. Instead you will find articles on this site that have content from other sites that talk about whatever it is you're searching for. You want them, but instead you get them, but it's an unrelated article hosted/ranked by THEM. Confusing? Yep, read that a couple times.
MrSavage - Have you tried a frame breaking script?
Have you tried any of the tools available to customize your feed to prevent the scraper feed from being seen by Google before it sees yours?
You are loud on complaints, but you don't seem to be responding to anything that suggests ways of fixing the problem.
I have implemented "frame busting" via header and via htaccess. As I'm sure everyone knows, changing code in Wordpress opens up the door for having to go in there and "redo" all those customizations after updates. Regardless at that, it's better than doing nothing in this instance. I personally don't think busting frames is the same way as having Google acknowledge a wrong but removing them via DMCA complaint process and reinstating me. Is frame busting going to reinstate me as the owner of my content and the past 8 months? No clue.
They don't have access to my feed now. These guys aren't idiots and it does appear to my simple mind that once they have your content, it's "in the system". Once fetched, they don't need to go back. Sure they could add more content from your site, but even with what they took from me, it was enough to get Google to call them the content owners/creators of it.
Currently with my frame busting implemented? You can take a piece of my content, paste into Google, see them at #1 position for that text, click the link, go to their page, then it resolves onto my site. A round about way of getting to my site. How would Google view this at this point? I'm all ears. I have no idea. At the very least, I'm getting some revenue back for those articles and links. My site as a whole? That's more my issue here. Will it get resolved in Google's eyes or not or will I be the victim of duplicate content regardless of busting out of the frame situation.
I should add that Bing doesn't have any issue with me being the author of the content. My rankings have been unaffected by this site using my content in Bing. They have been able to tell the difference.
What about Internet Archive? Does it show your stuff there?
Or have you registered your content with the US Copyright Ofc? They give you a registration number. That should be sufficient proof for Google that the content is yours and thus you are the authority, not them. If you didn't register, you should go do it now. At least then in 6 months or whenever the copyright ofc gets around to your registration, you will be able to prove to Google that you are the owner of the content. Or is there something I'm missing here?
This is what we did to fix this issue. We never make posts less than 1500 words long and our our RSS feed is set to excerpt.
Then they can go ahead and le their feed aggregators take every single one of your posts and it doesn't matter because the sheer amount of content you have overwhelms the little bits that they have.
Turns out by the way that our blogs that get this treatment did really well with Panda & Penguin - size matters now more than ever.
Did you file your DMCA only with Google? Is Google their hosting company? Honestly, when I file a DMCA it is ALWAYS with the offending site's hosting company, not Google.
If you have a valid complaint, hosting companies will typically act, and do so quickly. If the offending site is a blogger site THEN Google will act. I've had them take down pages on blogger in about 4 hrs of submitting a DMCA.
I don't see Google removing sites not hosted with them from their search results as the result of a DMCA request like yours. Where's the motivation to do so? If they don't act and you decide to sue them, they'll bury you with their fleet of attorneys.
Hosting companies, on the other hand, are MUCH more vulnerable to lawsuits (and the threat of lawsuits) so they are much more likely to act on DMCAs than Google ever would be.
Thanks for the replies. In terms of US Copyright office, it's something I haven't ventured into yet.
Regarding this site, unfortunately it's not just a blogger doing the scraping. It's much more massive and much more page rank. I have been reluctant to submit to a hosting company simply based on other people who have done so without success. I'm very discouraged with DMCA right now based on my filings over this which isn't helping me to deal with hosts. In the past, yes I have done that via a host but that was a different site and different situation. That was a "typical" scraping situation.
@not2easy, thanks for the suggestion. It opens my eyes certainly. Setting the feeds to show 1 seems appropriate now and in the future. I would like to select 0 but that won't work.
Unfortunately in this situation, once it has been grabbed, it's possessed. In other words these work arounds now, don't stop my content from existing on that site. Further, I've seen a site which was now a parked page, yet on their frame, they still have that sites content. Interesting? I would say so.
It puts me back at going ahead on DMCA again. Fine, will make another go at that.
The bigger picture here is something I hope is clear. You can blame Wordpress. What the F are they thinking. Building a default RSS feed into a blog which cannot simply be deactivate. Worse, making the defaults such that it displays the entire post with 10 and 20 at a time. A system that is built for non technical people has the biggest gaping exploit built right in with default settings to make scraper all that much easier. This is simply crazy and unacceptable to me. Get with the times WP.
The point is I can attempt to correct my situation, however it doesn't correct this black hole. Is this an issue of people not breaking copyright laws because they are simply broadcasting something that a WP site is doing?
This is an ugly beast of a situation that apparently got worse since Google is failing at knowing who owns content. This beast became the black hole it is with Google and not Bing. It's an exploit. There is a loophole. This isn't just "one guy" ranting on these forums. I would suggest if your traffic disappeared, if you're running Wordpress, I urge you to take some text from various posts, put it into Google and see who ranks for it. That is the FIRST SUGGESTION I have. Don't look at Panda, etc. See if your content is being attributed to someone else because of the algo. I only wish that I had of considered this as a possible culprit. Again, this is the first thing I would suggest to anyone who is running a fairly low PR blog which has RSS functionality. Sure it might be duplicate content your issue and you got Pandalized because of it, but the fact is it may not be something you've done. People who have taken your content and now rank instead of you. THAT IS THE ISSUE.
I'm not just ranting here. I'm offering a suggestion to people who saw their site tank in Google in the past year. Do it. Copy a sentence, paste it, and see. I can't name this site and I can't write a letter to the algo team to say jeez, this is kind of a big deal and why their system disregarded my complaints about somebody copying/taking my content and getting all the glory and revenue that is rightfully MINE.
For what it's worth...
I've never had a problem with DMCAs. Every DMCA I've ever submitted (a couple dozen) was honored. But I've always submitted them to the hosting company.
Fair enough, based on feedback here it's something I should have done at the same time as the Google filing. I will report back.
You cannot do many things. I suggest you to minimize the length of the feed and to block the scrappers from .htaccess file. I am in the same situation.
I've deleted a lot of content via the DMCA reports but the scrappers sites are simply infinite... They don't stop the content stealing. If you kill one, two appear in the same place.
I did want to mention or ask for possible clarification. Let me say though I am appreciative of the many suggestions put forward here!
Those who have had success with filing via hosts, was that a situation where you had already lost significant traffic or rankings in Google? I ask because on second thought, part of my thinking was that for Google to acknowledge any wrongdoing in terms of my ranking being affected by a scraper, they would be the ones I need to verify and let a DMCA request go through. In other words if I go ahead with a host, and that's successful, it doesn't feel that it would have the same resolve as having Google themselves see and acknowledge the DMCA. Does that makes sense how I've said it? In my situation my entire site tanked because of this scraping situation and of course I'm unclear which of the 2 avenues would be more effective or if at the end of the day they are equally as effective.
The other problem that I face here is that I've done some frame busting. I've dealt with a portion of the problem. How is this going to look when submitting a DMCA now? The question I'm asking is whether you have all left the scrape as is until the DMCA went through. I think in the past I've filed a DMCA, fixed the issue, and Google found no issues. I think part of this is that a successful DMCA would bring back lost time/rankings in Google's eyes. I could be way out to lunch on that. Fixing the problem without getting them to first see the theft/infringement, how is that a good thing or does it not matter?
I'm just having a bit of troubling grasping this process. Thank you.
Again, I think for the most part they can figure out who the originator is when there is a blatant or fly by night scraper site that shows up. In this instance, it's not a fly by night scraper. Sorry to repeat. I think though there is a difference between losing a bit of traffic over a scrape issue vs. losing your entire site over a scrape issue. That's why I'm saying this is a different beast than most. Agreed that one goes down, another one pops up. The thing is most scraper sites get called out and knocked out eventually before they can really do harm. An irritation and mainstay on the internet? Yes.
If you are still publishing entire posts to RSS without using any tools to modify the content and track it back to your site I don't know that a DMCA with the host will resolve your problem. All they have to do is find a new host. I would take that step after addressing the things on your site that allow them to do what they are doing.
Publish snippets to RSS, not complete posts or pages.
Look at simple plugins that let you add a header, footer, backlink in every RSS item. As long as you are an easy target, someone will be out there looking at how to use your work to get ahead.
Agree with not2easy.
The DMCA request is not really designed to say, "Hey Google. Stop ranking this guy who has copied my content higher than me."
The DMCA is a takedown request (or at least every that I have used it). It's really designed to say, "To whom it may concern. This guy is violating my copyright by publishing my content without my permission and you party to that infringement by hosting their content. And I want it taken down."
But it's not really going to help if you continue to allow them to suck your content out of your site so that they can mash it up on their site. Chances are that it could reappear again on the same site at a different URL.
Personally, I've never been a fan of RSS and syndicating my hard work so that others can profit from it (just like this case).
I filed a DMCA complaint against a feed reading site that simply frames your page and adds some links in a navigation bar that leads to their other content. I received a reply from Google stating that based on policy they will not take action. Instead they say that if I take legal action myself that results in my content being removed from their site that Google will adjust their search results accordingly.
It's my freaking page on their domain, template and all, but that's not against Google policy? Then it hit me, Google does the same thing to everyone with their image search...
Now, if only I could kill my site rss feed... [webmasterworld.com...]
@Sgt, this is more than just a little bit interesting. I would not be surprised if we're talking about the same site.
It appears that you got more insight into the rejected DMCA than I got.
It's very interesting what you say about image search.
To me, the plot thickens significantly. We are beating around the bush here, but I guess that's better than nothing.
It does appear that there is a major loophole that a site and perhaps many others will soon follow. If you can't touch this? Well when it comes to making money, if laws can't intervene then heck it's the wild west. So it does appear that there exists a way to scrape content, rank for it, run Adsense and not be subject to DMCA notices. Have I got all this right? Oh, in addition the site doing the scraping in this instance appears able to take your PR and ownership at some point. Did I miss anything else on this? It almost sounds fake or made up situation, but it does appear that this is in fact why this black hole has done rather well for itself.
I just took a peek at that link, and my suggestion is for anyone running RSS feeds, check it out. It does appear there is a possible plugin solution but when I see a couple hundred downloads it doesn't fill me with confidence. I'm lazy so I chose WP and frankly there should be a plugin that can deactivate feeds entirely and that don't require rewriting the code of WP or a theme.
Lastly, Mr. Kickaxe, with your attempts at removing RSS, do you feel confident that your already scraped content will disappear from this black hole? Or are you of the believe that once it's grabbed, they have you by the hanging sack of balls?
> "I can't name this site and I can't write a letter to the algo team to say jeez, this is kind of a big deal and why their system disregarded my complaints about somebody copying/taking my content and getting all the glory and revenue that is rightfully MINE." <
Hmm, have you really tried spam-mailing Matt C and the G-folks about it? what have you got to lose? He seems to have all the glory-answers about how their algo is so great at "determining who the (long-term) originator is" maybe he'll read it and address this with his programmer buddies.
Every time I search and the copiers pop up first I got straight to the give us feedback link and make a report of it.
I feel for you MrSavage
Mod's note: Noise and whining about Google have basically killed this discussion. I'm sending any serious discussion about preventative measures to this thread...
Questioning the wisdom of using fat pings to deal with scrapers