|SEO friendly ways to prevent plagiarism?|
and avoid any damage before plagiarism occurs
Last year, there was a topic here Best Ways to Prevent Plagiarism [webmasterworld.com], but it only dealt with ways to fix the damage that has been done by plagiarism / scraping that has already occurred.
I would like to compile a list of methods to prevent plagiarism from occurring in the first place... or at least make it as difficult as possible.
This is what I am already using to protect the text on my site:
- I try to insert my company name in every paragraph in the middle of a sentence.
- I embed some invisible CSS text with the TradeMark or Copyright symbols + my company name so that a text that appears to be "a quick brown fox jumps over the lazy dog" on my website, becomes "a quick brown foxTM My Company Name jumps over the lazy dog" once copied and pasted
- I use title and alt attributes with similar effects
Any other ideas for methods that can prevent scraping, yet leave the site entirely open to search engines?
All site from search engines are checked with editor. If you use too much hidden text it will marked as negative.
|I would like to compile a list of methods to prevent plagiarism |
Not possible. If it can be seen it can be taken. About the best that you can do is C&D anybody that you catch. That's not so hard unless they do a very good job of rewriting your work.
The hidden text doesn't really help and you could burn yourself bad if you get flagged for spamming your company name with hidden text. That's what it will look like to a bot. If they scrape the page and just throw it up somewhere, the hidden text still doesn't necessarily do anything for you. You're assuming that the text is thrown up blind. The best ripped content is rewritten (manually, automated, or a combination) and that stuff will get caught and tossed anyway.
The best defense is a good offense. Just keep building more good pages and more good sites. We get ripped all the time; great articles. If in USA, not too much trouble. We can usually identify and prove the theft and a C&D does the job right quick. Threaten DMCA and down it comes. Often rewritten and put back up, but is always second-rate to our original content. If outside the USA, not much leverage, but we have never seen any big negative hits from this. Wonder if Google accounts for international plagiarism and allows room for the fact that we don't have much leverage to get it down?
Which URL's have C&D and DMCA?
Ladies and Gents. Our website is really top notch, but because unsavory types rip off our content our presentation is as follows:
wheez rites rill gud but doz bad scrapper barbarians dinna want dis stick.
Guess what, they'll scrape that, too. All one can do is be reactive to copy theft...there is no way to PREVENT it. If that was possible there would be no copyright or Intellectual Properties laws anywhere. Even digital watermarks in images fails...as well as most of the DRM put in audio/film.
Go after the bad guys... or ... or.... look at the site that does rip you off, but does not rank as high as you, and see if there is any link juice coming (for a day or two) then stomp on 'em with both left feet.
If they use Adsense, be sure to send a complaint to Google. I've never seen people take down copied material as fast as when they're about to lose their Adsense account. If they do end up losing their account, be proud that you've done a service to the community of honest publishers.
|- I try to insert my company name in every paragraph in the middle of a sentence. |
At first I thought you meant you wrote it into the visible text. Forget about the CSS hidden text, it won't help you any. But do make reference to your own website when appropriate. Things like "Here at example.com we never write about X", or "SilverSpirit, the founder of example.com" is going to look strange on a scraped website, and will make your case easier if you have to get lawyers involved.
when i promote a story to the popular section of a social media platform, the splog syndicating my content is enumerated by the hundreds. my traffic and exposure only grows exponentially for it.
i wouldn't worry too much about it; i have seen many if not every content-rich site get plagiarized once it gets noticed on the web; some get attributed and linked some don't. the most important requirement on your part would be to get your content spidered 1st of course - xml site map submissions have been found to decrease the time between web crawls.
There was a time when one of my main content sites was ranked in the top 5 for a very competitive phrase.
During that time, I could catch a new thief every day, IF i set my mind to it.
After a while, I just gave up. If you're in Romania, my (possibly) threatening letter is going to mean nothing. If you're in India, you can hire 30 people for 80 cents an hour to "sort of" rewrite my text.
It's always an uphill battle. Can't be stopped.
This is a timely topic for me as I'll explain below.
I think there is something all of us who create our own content and have it stolen can do that will help, even if in a small way. That is to pursue copyright thieves aggressively and then make it known publicly. You don't have to give specifics of sites, names, etc., but a thread here every now and then where people are reporting on getting a site removed from the search rankings, getting an AdSense account closed, collecting damages, etc. would go a long ways to discouraging some of this activity.
Use the DMCA process to get the thieves removed from search rankings, follow the process to report it to AdSense if the site shows AdSense ads (same with YPN and others), contact their host if they use a hosting service, etc.
Now here's the timely part: I've been reading a lot lately on the U.S. Copyright site and I found the part about statutory damages very interesting. So I had a discussion with an attorney who specializes in copyright law.
I have a new site that I'm getting close to being ready to publish. Before I publish it, I'm going to go through the steps and register the site with the Copyright office. My understanding is that is a necessary step to collect statutory damages.
Then, once it's registered, I'm going to publish it and start monitoring for copyright violations. Every time I find a violation, I'm going to forward the information to the attorney and he is going to pursue the statutory damages.
I have no idea how this is going to work out and it will be a while longer before everything is in place, and I don't want to go into too much detail here about my conversations with the attorney, but based on what he told me, I'm optimistic.
If you're serious about this, and I hope you are, I encourage you to do the same thing I did. The attorney didn't charge me anything for the discussion, I guess he looks at it as an opportunity to get future business from me.
I live in a small city and even though there are a number of attorneys here, none practiced this type of intellectual law. I had to go to a larger city about an hour away. Look around, ask for referrals from local attorneys and find someone you are comfortable with.
Who knows, statutory damages might become another revenue stream? :) (although that's not my goal)
And finally, read about this for yourself on the U.S. Copyright Office's site, but even if you have a published site that you haven't registered, there are some provisions for registering it now and collecting statutory damages in the future.
|Before I publish it, I'm going to go through the steps and register the site with the Copyright office. My understanding is that is a necessary step to collect statutory damages. |
It has not been my understanding that this is necessary. I really like the idea of damages of ripped content becoming a revenue stream, but if practical would I not be under constant pitch to go after infringers for a split of the take? I have never had any reason to think the trouble of pursuing beyond C&D was worthwhile, much as I would like it to be. We have more problem with product images getting ripped than text. Would love to make them suffer and make a little profit also, but have never found an option that didn't cost me more time and money than would be worth and settle for takedowns. No punishment has always rankled me to no end. Is there a niche to be had in getting payback and profit? If yes, a lot of things would change in a hurry. My guess is that the answer is still no:((
Ive seen a few websites get more savvy about this lately. They place a small one pixel image in their content so that the user does not see it, but when automatic scrapers grab it and it is pulled to the scraper content page, the full size is displayed, often saying things like
"This is stolen content, get the real story at ____"
You can also disable certain elements in the right click which prevent the low end scraper from copying and pasting content. Some might see this as a handicap since often other quality websites will copy a sentence or two when writing about you, but all in all if often prevents the lazy scammers from copying the data as well since they would then have to view source and try and navigation in there and copy the content.
|It has not been my understanding that this is necessary. |
Not necessary to collect statutory damages? That's not what I understood from reading on the Copyright Office site.
I didn't ask that question directly of the attorney, but he did indicate that any action he would take on my behalf is much easier/more effective if the site is properly registered.
|but if practical would I not be under constant pitch to go after infringers for a split of the take? |
Not sure what you mean by that.
|I have never had any reason to think the trouble of pursuing beyond C&D was worthwhile, much as I would like it to be. |
Read the text concerning statutory damages on the Copyright Office site and then do what I did and talk with an attorney who deals in this area of the law.
|but if practical would I not be under constant pitch to go after infringers for a split of the take? |
|Not sure what you mean by that. |
ROI - I am bombarded with pitches for every IT and development service that exists (and some that don't), but nothing from lawyers, lawyer reps, infringement protection services, or anybody else that could benefit in an area in which there is clearly a market. Thus, I question why. Because it is an unexploited opportunity just waiting for some sharks to swim in and change the game? Or because it just isn't a worthwhile detour for anyone to burn time and money?
|Thus, I question why. Because it is an unexploited opportunity just waiting for some sharks to swim in and change the game? Or because it just isn't a worthwhile detour for anyone to burn time and money? |
I understand your point and I've wondered about the same thing.
1. At least some states still have restructions on the advertising methods available to attorneys
2. I get those IT and development service pitches also - and many are SPAM. I'm probably like a lot of people that just delete the message and don't think about it again as long as the same company doesn't make it a habit. My guess is most attorneys or law firm are not going to engage in those practices for fear of the consequences / damage to professional license.
Or it may be that many of them just aren't yet cyber-savvy.
3. Like a lot of people, attorneys might not be aware of the number of "home based webmasters" earning income from advertising, PPC programs, etc. with websites. And not aware of the extent of the copyright problem - or the potential for business.
Or maybe they are aware but have the misconception that the only thing available to them here is a bunch of $50 disputes.
All I can say is the attorney I talked with was with a firm with offices in some expensive real estate and he certainly seemed to welcome my potential business.
Look at it another way. If a number of attorneys realized the potential and took advantage of it, the problem would start to go away. It's like the TCPA and unsolicited fax advertisements - once attorneys and individuals learned how to make money and use it to fight the practice of unsolicited fax advertisements, that practice slowed drastically.
|...to go after infringers for a split of the take?... |
I don't know if this is by law or just the preference of the law firm, but the attorney I spoke with indicated they do this type work on strictly an hourly basis, not for contingency fees.
There's a lot we could discuss about that arrangement, both pro and con, and I don't want to get into detail about it here, but I left with the understanding that they would not only seek statutory damges and maybe actual damages from an offender, but also attorney fees.
At this time, if you don't want it nicked, don't publish it online. Otherwise I think you have to accept that it will be, as it is part of the current environment.
What I now do is ask them to remove it or credit it back to my site with a link. Some of them have agreed to this and a link is a link.
I favor a method of inserting a period image in place of an actual period somewhere in the text. The image would track back to my site.
I would use my logs to locate any non-site reference to period.gif. These would be the plagiarists.
Just goes to show there are many tricks for finding or catching them. We use a variety of techniques, because the sharper ones will catch and pull out some techniques. What to do is where it gets tricky.
I have excellent success in US with C&D but that is no deterrent. Can't justify lawyers. Don't have time for DMCA and all that jazz.
We all have rock solid cases and absolute evidence that what is ours is ours and what is stolen is ours. Still begs the question to me why I don't get offers to pursue thieves in exchange for a big cut of the payoff. I get payback, and maybe a few bucks. It would be game changing in a big hurry. I can only assume that the payoff isn't there. There are too many people with the expertise and connections to make this a litigation cash cow. The clock would be running, because if successful it would change everything.
Picture in the text didn't work if before re-posting your text a thief is putting it into the notepad for example... All hidden information will be deleted. So if your text were stolen manually, there is a big chance not to find it with this method
You are absolutely right but like thwarting spammers it requires several different methods each catching a percentage and in the end never 100%.
The imbeded <img doesn't stop all but often does catch one of the most annoying types that scrape entire pages and substitute their logos on them.
The harder to catch plagiarists is one that paraphrases a little bit of your copy. He is the one that is likely to manually take the copy and rework it in notepad.
Still the <img gets an amazing number of them.
I have to second cyril kearney's comments. While simplistic methods such as "no right click" scripts will not hinder the determined plagiariser, most of the scrapers are scraping precisely because they are lazy, often in addition to being ignorant. They usually don't seem to know how to do the most basic things in HTML, so they "have" to a copy-n-paste of the text and images, or else do a "Save page as...", the results of which they then upload, almost entirely unchanged, to their own server.
Sad to say, using "no right click" scripts, typing your copyright notice in a color matching the background of the page, and inserting a hidden image will catch a surprisingly (and depressingly) large percentage of offenders.
|Ive seen a few websites get more savvy about this lately. They place a small one pixel image in their content so that the user does not see it, but when automatic scrapers grab it and it is pulled to the scraper content page, the full size is displayed.... |
Perhaps 1% of scrappers are real humans who use copy-n-paste but the rest are bots and automatically retrieve the page content.
Therefore bot retrieves page HTML content in $buf variable
then in a language like PHP
$buf = strip_tags($buf);
And there goes the image tag or any other html stuff. I hope you get the idea why none of these techniques have an effect.