Proactive Copyright Protection - Content, Writing and Copyright forum at WebmasterWorld - WebmasterWorld

Forum Moderators: not2easy

Message Too Old, No Replies

Proactive Copyright Protection

So much to look out for.. so little time..

Dynamoo

10:57 am on Feb 7, 2005 (gmt 0)

10+ Year Member

I guess a lot of webmasters have a similar problem to this - when your site starts to become successful and you start hitting the top spots in the SERPs for your keyword, then people will inevitably rip off parts of your content.

Sometimes, this copied content will end of on a page with greater "authority" then your own (e.g. PageRank) and you can find your page being removed completely from the SERPs in favour of the copier's page. In certain cases I've had to file DMCA complaints to have pages removed from the SERPs altogether because of this, which is a real pain.

In order to prevent this, it would be useful to find the duplicate pages very quickly and have them removed. Finding the duplicate pages in Google (or whatever) isn't difficult - the problem is that when you repeat the steps several hundred times it becomes incredibly time consuming, so this could clearly benefit from some automation. I should be building pages, not hunting down site copying scum.

Indeed, the principles of automation are really simple if you have the skills. Ask Google to find "something with a unique block of text in it" -mydomainname and it should always come up with no matches. Where matches are found, it should be simple to come up with a report.

So why can't I find a tool to do this?

The closest thing I've seen is Copysentry and really that looks expensive. The free version (Copyscape) picks up loads of false positives and scraper sites that I'm not too bothered about... what I want to find is copied text large enough to be considered a duplicate.

How do other webmasters go about protecting their sites? Are there other tools that can help which I just haven't been able to find? Is Copysentry better than Copyscape?

Thanks :)

engine

5:25 pm on Feb 7, 2005 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Here's an idea.

Have you tried building a database? You can use a spreadsheet if you prefer.

Simply, input your search terms with its unique content into the search engine, then save the url in the database/spreadsheet.

Once you've got the key search terms you don't need to re-type these, just run the search by clicking on the saved url in the database.

Keep adding new and unique search terms to the database as you go.

HTH

Dynamoo

10:46 pm on Feb 7, 2005 (gmt 0)

10+ Year Member

Yes, I've thought about the semi-automated approach, but you've still got to visually check all the results, which for hundreds of pages is a real drag.

hunderdown

4:04 pm on Feb 8, 2005 (gmt 0)

To save time you might just check a sample of your site's pages. If I understand how these sites work, they go for the top content or they scrape everything.

I periodically check the five or six pages I care the most about -- most unique content, most visited, etc. -- and figure that if no one is ripping them off, they aren't ripping off other pages either.

Lorel

1:12 am on Feb 10, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

How do other webmasters go about protecting their sites?

I haven't found a tool other than a good Site Meter which gives me all the referral links. If you have too much traffic to do this then maybe Google is a good alternative.

What i do is contact the owner and ask them to remove my content. If I don't hear back within a few days then I write their host giving them all the links they need to verify what I'm saying (often a google cache is a good 3rd party witness).

I usually hear back from the host within 24 hours because they are bound by law to not allow copyright infringement to prosper where their clients are concerned or they will loose their lisence so they usually react pretty quickly.

They usually inform me they will remove the site entirely if the culprit doesn't remove the content within the next 24 hours.

This is time consuming work because it takes about 1/2 hour to write the letter, get the email address (often only found on domain WhoIs data). Then if that doesn't work you have to write the hosting company too--another 1/2 hour.

And multiply that by the 25 sites I manage and that's about all I get done some days.

But it's usually effective and you get the pleasure of seening a crook loose his livelihood.

Ha!

whw1

10:24 am on Feb 24, 2005 (gmt 0)

One way to protect and/or deter your material, is to have a very clear Copyright Graphics that is known and identifies Federal Copyright. If you do not have registered copyright, then it does not matter.

[edited by: rogerd at 1:50 pm (utc) on Feb. 24, 2005]
[edit reason] No specifics/URLs, please... [/edit]