Forum Moderators: not2easy
Sometimes, this copied content will end of on a page with greater "authority" then your own (e.g. PageRank) and you can find your page being removed completely from the SERPs in favour of the copier's page. In certain cases I've had to file DMCA complaints to have pages removed from the SERPs altogether because of this, which is a real pain.
In order to prevent this, it would be useful to find the duplicate pages very quickly and have them removed. Finding the duplicate pages in Google (or whatever) isn't difficult - the problem is that when you repeat the steps several hundred times it becomes incredibly time consuming, so this could clearly benefit from some automation. I should be building pages, not hunting down site copying scum.
Indeed, the principles of automation are really simple if you have the skills. Ask Google to find "something with a unique block of text in it" -mydomainname and it should always come up with no matches. Where matches are found, it should be simple to come up with a report.
So why can't I find a tool to do this?
The closest thing I've seen is Copysentry and really that looks expensive. The free version (Copyscape) picks up loads of false positives and scraper sites that I'm not too bothered about... what I want to find is copied text large enough to be considered a duplicate.
How do other webmasters go about protecting their sites? Are there other tools that can help which I just haven't been able to find? Is Copysentry better than Copyscape?
Thanks :)
Have you tried building a database? You can use a spreadsheet if you prefer.
Simply, input your search terms with its unique content into the search engine, then save the url in the database/spreadsheet.
Once you've got the key search terms you don't need to re-type these, just run the search by clicking on the saved url in the database.
Keep adding new and unique search terms to the database as you go.
HTH
I periodically check the five or six pages I care the most about -- most unique content, most visited, etc. -- and figure that if no one is ripping them off, they aren't ripping off other pages either.
How do other webmasters go about protecting their sites?
I haven't found a tool other than a good Site Meter which gives me all the referral links. If you have too much traffic to do this then maybe Google is a good alternative.
What i do is contact the owner and ask them to remove my content. If I don't hear back within a few days then I write their host giving them all the links they need to verify what I'm saying (often a google cache is a good 3rd party witness).
I usually hear back from the host within 24 hours because they are bound by law to not allow copyright infringement to prosper where their clients are concerned or they will loose their lisence so they usually react pretty quickly.
They usually inform me they will remove the site entirely if the culprit doesn't remove the content within the next 24 hours.
This is time consuming work because it takes about 1/2 hour to write the letter, get the email address (often only found on domain WhoIs data). Then if that doesn't work you have to write the hosting company too--another 1/2 hour.
And multiply that by the 25 sites I manage and that's about all I get done some days.
But it's usually effective and you get the pleasure of seening a crook loose his livelihood.
Ha!
[edited by: rogerd at 1:50 pm (utc) on Feb. 24, 2005]
[edit reason] No specifics/URLs, please... [/edit]