| This 33 message thread spans 2 pages: 33 (  2 ) > > || |
|You can lose ranking if others copy you|
Case Highlights -
1. A site runs smoothly for over a year with good ranks in Google.
2. Someone copies the content of a page and posts it on many high traffic/high PR portals such as free classifieds and review sites.
3. Ranks for that page drop in Google, while ranks for other pages remain.
4. We contact those portals individually and ask them to remove the content, which they do.
5. Ranks for this page comes back within a week.
At the time, I did think it might just be a coincidence, after all my site had that content before other portals copied it. But, I wasn't as much confident as before.
A new case -
1. I am asked to review a site, that in spite of being an authority in its space, ranks very low, while ranks in Yahoo/Bing consistently in the first page for majority of keywords.
2. I check to see if its content was copied, since I find nothing else wrong. Yes, in fact it has given its content to many major/high PR review portals and content aggregators. Many of the ranks in the first page refer to this site, while this site itself ranks in 100s and 200s.
Now, having been through a similar experience, I tend to think this site is a victim of copied content (not plagiarism) and it is practically impossible to get other sites to remove content since there are hundreds of them. Rewriting seems to be the only option, but a tedious one, since there are hundreds of pages.
My 2 questions are -
1. Do you think it just can't happen, since Google says there is almost nothing one can do to harm your ranks. Has that "almost" given them enough room to excuse themselves out of a pretty nasty situation?
2. Should this website go the length of rewriting their entire content or is there a easier way out?
|rewriting their entire content |
And then it gets c&p a thousand places again...do you rewrite again? This is a problem google has to deal with. I was seeing some improvement with how google handled original vs copied content last year, but the past few months seems to have bounced back with higher authority sites outranking originators.
I was thinking it was due to the clown asshattery that google's putting on display for the past few months and that it would sort itself out when it straightened out with its caffeine or smack or whatever they're calling it now...but now I'm starting to wonder if google just doesn't give a $hit anymore and has moved on to more fun things to focus on (videos, url shorteners, smart phones, whatever).
Maybe this is the new Google motto: As long as the serps display SOMETHING that's kinda/sorta what's being searched for, good 'nuff.
as a rule, Google seems to focus on getting the information to their search user - and the original owner attribution has long taken a back seat. If a more prominent site reprints your content, they often can push your site out of the top positions.
If you are intentionally syndicating content, you should plan for this and make sure that what syndication you do authorize always contains a link back to your site as the "first published". And even then, don't give it all away - keep some of the gems for yourself.
If someone does an unauthorized re-publishing of your content, especially without the link (the attribution really helps Google) then you've got some work - there are other steps you can take. See this older thread Scraped or Stolen Content: What To Do First [webmasterworld.com]. It has a few years on it, but it's full of sanity and good advice.
|Google says there is almost nothing one can do to harm your ranks. |
If a competitor intentionally did this, you would probably be able to nail them. It's that flock of generic "scrappers" that can wreak havoc.
Still, your best defense is a good offense. Build your site's strength in every way you can and it becomes harder and harder for an impostor to outrank the original - especially a low profile, disposable auto-generated scraper site.
[edited by: tedster at 3:39 am (utc) on Apr 2, 2010]
At the very least, if the original content includes graphics, a webmaster could consider utilizing something like "hotlink protection". A lot of hosting services these days includes cPanel, so for anyone who has that, look under "Security", to define which domains are allowable for graphic links. Then, if someone scrapes a page and tries to include your own graphics, they may end up with a bunch of broken icons, which diminishes the value of the page for any visitors.
Or don't even use hot link protection, use the original image names to display a logo stating that this image is being ripped off and display your URL for their visitors to see and then change your site to reflect new image names. I had a competitor that persisted in doing this so I changed the img url's and put some scat images up for their visitors to see and I didn't waste my time contacting their webmaster the final time....guess what...no more stealing my images.
Thanks for the really good responses.
Tallon, the reason why this content is so widely used all over the place is it is very good content, and each page contains over a thousand words. I am planning to suggest them to give a synopsis of the product in each page and provide detailed content in a new page or a pdf. To get that content in pdf, one needs to register with the site. Good lead generation opportunity as well.
Ted, the worst of it is, the company has willingly given away this content, since those portals anyway talk about this company. But in most cases, they do not link back.
There are no issues with images. It is the content that others are using has landed them in the soup.
I just don't get why Google can't say first-come, first served. Or at least build in a time-limit where after that limit duplicate content doesn't hurt the original. Sort of thing a techie could knock up before breakfast I'd have thought.
I appreciate to Google it's about end user experience and not webmasters, but things like this need a balance. There needs to be some respect paid to the content originators. After all, without them, Google wouldn't be where it is.
McMohan and very easy and quick fix for this is require a link back to the content page in anchor text that describes the article key words. If the content is that good most will add the link the ones that don't require them to take it down.
I was just looking at a PR8 website that has referenced dozens of our articles. Most of those pages are PR2 and PR3. They don't outrank us on any of those articles for the terms we're targeting.
Personally, I think it only helps us. It certainly has not hurt us. I would think Google has this one nailed.
Not sure they have it totally nailed. Indexing has been a problem for me over the last few months, but following a few Google searches it's not so much of a problem for those who stole my content before it was indexed.
Does that mean they wrote it? That the rubbish shops, the Chinese merchants, eBay, scapers and all the other freeloaders were the originators of my content?
Wants sorting. Another week going to be wasted sending out DMCAs (and reading Tedster's link).
Edit> Read that. I'm fed up enough to try the law route. It's endemic and people assume they will get away with it.
I've had a handful of heavily scraped client pages that used to disappear occasionally. When one of them did, turning off the dupe filter (by adding &filter=0 to the end of the serps page url) was a good test to see whether it was a dupe content issue.
Using Copyscape and/or searching for short unique text strings would find the scraper. I say "short unique text strings" because scrapers had begun scrambling scraped text enough that an exact search on a whole sentence sometimes wasn't able to find them.
Google generally sorted this out in a couple of days, and the pages would re-appear. In my experience, and I consider myself lucky, they've never not sorted it out. But I don't rest easily and believe that Google has solved the problem.
I wonder whether Google is tracking these over time, or is it just that best linked copy wins. And what happens if we rewrite a page that has a bunch of scraped copies out there?
|I just don't get why Google can't say first-come, first served. |
As I indicate above, factoring in time is a natural thing to want to do. But I think this would be like saying that first public copy on the web is considered to be the original. I know of a lot of great material that's not yet on the web, or not on the web publicly. I think that first come, first served would create more problems than it would solve.
Does adding any randomness to the original page help protect against this?
|as a rule, Google seems to focus on getting the information to their search user - and the original owner attribution has long taken a back seat. If a more prominent site reprints your content, they often can push your site out of the top positions. |
It's too bad they (Google) have such blatant disregard for the rights of the owner of the content in everything they do... It would seem to me if they know the original source of the content and another 'prominent' site reprints it, that should lend credibility to the original source, not replace it. Even more than a link back!
How much more of a 'signal of quality' is there than content being reproduced? If it was s*** no one would copy it and no 'credible source' would reproduce it... It's not that tough to figure out. If it's reproduced it should lend credibility to the original, not favor the 'most linked' scraper. Seriously, what am I missing, because they're really not that inept are they?
Why can't they just respect the content owners the way they should and do it the right way instead of stepping on everyone?
It's almost like they do it on purpose or something... There's absolutely no reason why they couldn't notice content being reproduced on multiple sites and assign the credibility (rankings) to the original source rather than 'the most linked' or 'most prominent' copier.
@ Robert Charleton... I just read your post more carefully, and I disagree. If the original is not the copyright owners version and it's the top result for the content it makes it very easy for the true owner to protect their rights and a single notification could be used to replace the copy with the owners site in the results, or remove it from the results altogether.
IMO: It makes copyright enforcement much easier. It makes finding copyright infringing sites much easier. It makes detecting repeat copyright infringing sites and suppressing them in the results much easier. Again, IMO, it does not make things tougher to track down or get the credit for the content correct in the results, it makes it very easy...
1st publisher gets the credibility and the rankings for the content... If they're not the true owner and the true owner files a DMCA complaint and includes the URL of the content they own or states it's not allowed on the Internet. Owner's site gets the credibility for the content not the copier(s) or the scraper(s), or all pages containing the infringing material are suppressed, because the content is not permitted. If it's a falsely filed DMCA complaint there is recourse against the filer. I'm not sure how this would create issues or make things more complicated, but it might move wikipedia and some of the popular 'news regurgitators' out of the number 1 spot for all searches, so maybe that's the issue or something?
Maybe they can't do it like I suggest because it could decimate the whole idea of PageRank being a great way to determine the best results? It might not ruin the idea of PageRank altogether, but it would certainly move uniqueness and origination much higher on the scale of importance, and it would also certainly be a deterrent to scraping, because it would do you no good whatsoever if only the originating source was presented to searchers... Maybe Bing or some other new SE will do it since their goals are different the Google's.
Bing Fine with NOT Catching Goolge [webmasterworld.com]
I'm actually wondering if they are possibly opening themselves up to another YouTube style copyright infringement suit by using mechanisms, including PageRank and TrustRank to determine the top results when there is duplication, rather than origination?
If a 'small' site publishes a copyrighted work and another 'larger' site publishes the same material later and is displayed rather than the originator it would seem to me they are not removing apparently infringing material since they know it was originally published elsewhere first, and if there are ads on the results page, they are profiting by showing a page in the results they themselves determine to be 'more important' than than the originator (owner's).
It seems to me, by not giving credit to the original source of the content and suppressing all other duplicate publications of the content they are promoting what could easily be deemed apparently infringing content and using the promotion of the apparently infringing content for profit, both of which remove them from DMCA protection and makes them party to the infringement...
Sorry for the triple post...
I asked the above question with better words here: [webmasterworld.com...]
Calling Google's Bluff
Please reread my opening post of this thread again. Now, convinced that rewriting the content is the only option, I suggested the client go the length of rewriting the content of pages that are copied, which they did. It is 10 days since it was done and today, site ranks for almost all the keywords (many of them show thousands of searches in Google keyword tool) in the first page. That is, from hundreds and two hundreds to within top 10.
This is not a remote/low profile site. It is a PR5 website of a company that is a leader in its niche. Yahoo and Bing have got it right and Google couldn't manage that. I believe this case study should create enough noise so as to catch Google's eyes.
|Google says there is almost nothing one can do to harm your ranks. |
Google says lots of things, like all major companies who have an image to protect. It's highly unlikely that Google can always know who was the original creator of content, especially if larger sites are copying material.
|If a 'small' site publishes a copyrighted work and another 'larger' site publishes the same material later and is displayed rather than the originator |
That's exactly what is happening in many cases. And I wish what you mention here -TheMadScientist- google takes note of.
Now something about the OP's point:
|1. I am asked to review a site, that in spite of being an authority in its space, ranks very low, while ranks in Yahoo/Bing consistently in the first page for majority of keywords. |
I did some testing with one site only so I cannot confirm the results reliably, but for the site I restricted access to google via robots.txt, while I allowed access for the other 2 spiders you mentioned.
A year before that, I published an original s/w application with a unique name and at the time of the restriction, google had other sites in front when searching for the exact name of the app. The other sites are s/w repositories. Worth to note at the time of publication which was sometime before publishing to the repositories, I had the only site with references to the application because the name was unique. There wasn't anything else in the spiders index about the particular phrase.
So while google had the authority site at a lower pos than the repositories, the other spiders were ranking the authority site first. Few months went by, with the google restriction in effect and then I noticed bing and yahoo they also moved the authority site lower, a replica in other words of what happened with google. I just don't think that is a coincidence and for those few months, the content of the site was not changed in any way. So are the spiders try to balance their index also?
NO & Never if any one have skill then he/she use to copy my seo work...
So what are the options here? What can you do if other sites rank in front of yours? I'm experiencing this ever since the traffic drop on June 2, some scraper sites outrank my content, sometimes my content is not even in the SERPS, only scraper content. One of the sites even copies only the first paragraph of the article and manages to outrank my site on many topics. All with links back to my site with the original content.
That can't be right, can it? So what can I do in this case?
|What can you do if other sites rank in front of yours? I'm experiencing this ever since the traffic drop on June 2, some scraper sites outrank my content, sometimes my content is not even in the SERPS, only scraper content. |
In my experience when a low ranking scraper site copies your content and outranks you, it is usually because your site has a spam penalty. The scraper ranking higher than your site usually is a symptom of the problem, not the cause.
When authority sites copy your content you usually have to just rewrite your page, get it off their site somehow, or theoretically you could resort to black hat stuff to keep their page from ranking, It is usually easiest to rewrite your stuff and it gives your site some fresh content points anyway.
This issue is only going to get worse rather than better anyway because of the proliferation of publishers that pay people $3 (or less!) an article. You can't make a living doing that and leave time for original research. One thing Bing does better than Google in my opinion is weed out some of the cheap article sites and directories, which may be Google's Achilles heel going forward.
Jane thanks for the answer, any idea on how to investigate the issue further? Do you think that this is a temporary thing or permanent until I find the problem and make a reconsideration request?
|It is usually easiest to rewrite your stuff and it gives your site some fresh content points anyway. |
Jane - This point raises a question which I've seen asked often enough that I'll throw it out here for discussion... and that is, if you rewrite your material sufficiently, is there a chance that Google may reset the age factor on your inbound links, if not the link credits themselves?
The idea behind this is that the content that those links once recommended is no longer the same. Similarly, the question arises, are you throwing away any claim to historical originality (if there is such a claim on the web), by changing the content?
I'm not talking about making the material less relevant, by the rewrite, btw, but simply about the historical linking factors. Possibly these are tinfoil hat theories, or possibly they're realistic concerns.
Even where I've needed to make changes to frequently scraped pages on which we're still ranking, these concerns have been in the back of my mind... and I've generally tended to add material to the old content, particularly if it was good content, rather than to start over with it.
IMO, it's not a settled area. Here's a parallel discussion which references this thread, that's also worth reading....
Destroy SEO by copying/duplicating content?
|Google seems to focus on getting the information to their search user - and the original owner attribution has long taken a back seat. |
One way in which Google could check content age is using Alexa's Way back Machine data. Google probably has their own archive they could check. Our main competitor who plagiarized our entire site, sales copy and product line uses robots.txt to block the ai_archiver - this after we filed a copyright complaint against them using way back machine archive url's proving their illegal copying. Friggin weasels!
Knowing how Google operates, I don't know if I'd really want them to become judge and jury on copyright issues. They'll probably not get it right.
|Jane thanks for the answer, any idea on how to investigate the issue further? Do you think that this is a temporary thing or permanent until I find the problem and make a reconsideration request? |
In my experience it is permanent until you clean up your site. I have never had to submit a reconsideration request unless the site was totally deindexed. I think reconsideration requests are like asking the IRS "would this trigger an audit?"
|Jane - This point raises a question which I've seen asked often enough that I'll throw it out here for discussion... and that is, if you rewrite your material sufficiently, is there a chance that Google may reset the age factor on your inbound links, if not the link credits themselves? |
If your page was on poodle care and you rewrite it to sell cheap viagra, then maybe you would have a problem. But otherwise people update their content all of the time to make their pages more current or to add new ideas or concepts, so I don't know why that would be a problem.
|One way in which Google could check content age is using Alexa's Way back Machine data. |
I think in the past the Google people have said they try to find the original version of a page, but I have never seen this actually work in the real world. I had a page up for a year and a well ranking nonprofit copied the page word for word and immediately started ranking and my page got buried. It took a long time to get it back in the listings.
Another time some directory did some kind of php link to a new site of mine and their linking page got ranked and my actual page didn't until it got more links. Most of the time, as Tedster said in a previous post, if a more prominent site publishes your content then they can get the rankings instead of you. However with my sites more than 99% of the time this doesn't happen because the sites doing the copying are usually low ranked scraper sites.
One of the few downsides to the kind of work is coming in contact, even if it is only online, with so many low lifes - site hackers, people who try to sink your site with black hat techniques, the never ending supply of plagiarists, among others.
|are you throwing away any claim to historical originality |
I do not think Google gives much importance to originality of content as much it does to the PR. Whether you have the original content or not, if the content you have is great or not, all that finally matters is the PR (not necessarily the visible TBPR).
Now we are seeing a Google that agrees with the content that a big fat website displays, whether it originally created that content or not, is not Google's concern.
Of course there are many advantages to this policy of Google's, mainly in fighting spam. But what is the downside? Rightful owner of content not just loses credit to his/her content, but ranks too.
On this issue I would say that Google should take into account when an article or page is FIRST submitted to Google sitemaps/WMT. Original owners and white hat sites will almost always submit their pages to WMT. So if a site submits content via WMT and they are first to do so they they "should" be the original writers and any other page with the same content should be disregarded. Just my two cents.
I have always been surprised that that my largest site has not had its content copied. I believe the reason to be that a significant and crucial part of the site content is generated on the fly by asp server side programming.
After all, my content can be stolen but the stealer has no idea how the content has been generated and if the content they steal makes sense to the majority of users - the same goes for php pages.
|Original owners and white hat sites will almost always submit their pages to WMT. |
I wonder about that. What % of webmasters use Google Google WMT, or are even aware of it, and submit a site map?
It's difficult. From an automated perspective, how can Google determine which source is original? We had content scrapers outrank us back in our early days and it was frustrating. Typically, though, if a well-trafficked site is lifting content without permission Google will find out about it sooner or later.
| This 33 message thread spans 2 pages: 33 (  2 ) > > |