dealing with heavily copied content

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

dealing with heavily copied content

take action or re-write?

soapystar

2:09 pm on Sep 27, 2007 (gmt 0)

I have always been of the opinion that you can deal with copied content by dealing directly with offending companies and their hosts. However it seems totally out of hand these days and can only get worse over time. The time intensive task over going through ever page of a website to find all instances of copied material will be never ending if you have to deal with the onslaught of copy and pasted sites hitting the net daily. The problem of course is that IMHO there is a clear cause and effect between dropped pages and the number of sites out there using the copied content of that page. Not the whole page but various sections within it. So my question is how many webmasters have changed the way they deal with this. Have you taken to re-writing your pages and stopped pursuing offending sites?

tedster

9:01 pm on Sep 27, 2007 (gmt 0)

I rarely go directly after the copiers these days - instead I focus on strengthening the website itself. I agree that content scraping, and autogenerated pages that re-shuffle various bits and pieces from all over the place are trouble. Google knows this, too.

It's harder for copied content to beat a strong website - but from time to time it happens. During some periods of adjustment in Google, it can get pretty rough.

RandomDot

9:28 pm on Sep 27, 2007 (gmt 0)

I don't know - i've personally given up on any rights on anything in any matter on the internet - I don't publish anything or put anything on the internet which I don't want to be re-distributed on a massive scale, be edited, laughed at, cried about, never quoted or done anything which is possible with it - it is how it is - and it is not going to change - no matter how many prayers or wishes are put into it - that's my experience.

If you close one website down which for some reason used your works or whatever you call those articles which were made and usually are a derivative from other peoples works anyway - call it "inspiration" ... five other websites has somehow managed to also take some of your content and republish it and all in the time it took to call your lawyer and begin the first case, then rinse and repeat - while the first domain magically reappears on another server or on another domain under another name -

For me, and that's just me - chasing these ghosts are a waste of my limited resources, so I simply choose to ignore it and focus on what I am actually doing, not what everybody else is - in the end - i'll still die and nobody will remember me anyways.

soapystar

10:33 pm on Sep 27, 2007 (gmt 0)

well i hear what your both saying.....i guess this was the conclusion i had come too....in the past having sites remove content was pretty successful but the numbers are simply expanding exponentially as sites copy sites that copied sites to start with...

RandomDot

11:22 pm on Sep 27, 2007 (gmt 0)

It's the nature of the media and the platform .. and to be honest about it - if you publish something on it and have done just a little bit of homework on the media itself - don't you expect copies to be made in a large scale if you made something which had some kind of perceived quality?

You only have the advantage of being the first if you create something - and the rest is simply competition who either wants you out of business, or a piece of the cake.. it's the free and unregulated market in effect - and it's a total mess on a global platform like this, too many interests in play with different regulations of how, where, when, who, why and just forget it.

The thing with the internet and the undernet as it is right now, and how it works: Either you adapt to the conditions of it, or you find another media/market/platform which suits you better and are better regulated towards your interests -

Lorel

2:37 am on Sep 28, 2007 (gmt 0)

I agree with Tedster that strengthening a site helps deter scraping. The older a site is and the more pages and more trust it gains (along with measures to help deter scraping like using full urls, etc.) the less likely that having scraped content will cause any harm. Two scrapers copied an article of mine months ago (I didn't notice it till recently) yet they don't rank above page 15 for the main words in that article. I rank on page 3. I may rank higher if I hadn't been scraped but it's a very competitive term. I won't know until I get them removed.

RandomDot

3:02 am on Sep 28, 2007 (gmt 0)

The large scale scrapers with copy/paste or more likely a bot to do it for them have an obvious weakness - they scrape because they don't bother doing it themselves and usually also remove all "live" links in the contents.. Now, if they just republish your contents without links - you can expect them to behave like they have always done -

It's not going to get updated ever and they are not going to like it if it requires some work to just clean it up so it's not a huge advertisement for you - anyways, also expect the scrapers that they will leave the contents to rot, they don't care about it. Their weakness, is your strength - along with some common sense.

When you write an article, always include your domain name a few times in the text both in a visible color, and with the background color as in www.domain.com a few smart places in the article ... remember, the large scale scrapers are lazy - they are not going to remove it... the same goes with a headline, put your domain name there - just make it take a little work to actually "fix" your contents .... and often they will just go somewhere else because it's not easy anymore...

Also, don't go for anybody and everybody who takes your contents and throw it around - just go for those who is beating you at your own game. Always target your effort on the major problem, not the guy who just thought "this was so cool" i'll take this article it's mine mine and clean it up and miiiiiiiine . (you know, that little guy gollum - he's not the problem, Sauron is)

It's a game, if you're a good player, you'll figure out more creative ways to make it a really sad story to republish your contents on a large scale - by working on internal factors in your own site, not trying to take everybody elses down. I repeat: People are lazy. You aren't - figure out who the best player is ;)

vincevincevince

3:27 am on Sep 28, 2007 (gmt 0)

First, file DMCA notices against all US-based webhosts or server companies involved. Personally, I skip informing the webmaster first as he might not be US-based and it seldom works. Hosts and server companies normally take down whole sites or even servers (not just individual infringing pages) meaning potentially crippling losses for the webmaster renting a dedicated server from which he runs multiple scrapers.

Second, if you do re-write, don't abandon your original content. There's obviously a market for it, so arrange for it to be used on other sites, by agreement and with appropriate links.

CainIV

4:17 am on Sep 28, 2007 (gmt 0)

I try and generally include a link to two in my content to another content page in my site, since most people copying do so by way of automation and pick up your link.

Since there are so many people copying pages nowadays, you would have to hire someone part to full time to keep up with the mess of contacting for removal.

Like Ted says, the strong site almost always wins out when Google is stable and running correct'y :P

matrix_neo

12:41 pm on Sep 28, 2007 (gmt 0)

Rewriting does not seem to be an option, I have a dedicated resource to bring the infringing sites down, I should say I am quite sucessful.

soapystar

4:08 pm on Oct 1, 2007 (gmt 0)

when you talk about strengthening your site i wonder how this fits in...

i can search for various snippets from tripadvisor customer reviews that i found being used on another website and tripadvsior only shows if u click for the omitted results...now id have said tripadvisor was so heavyweight it is always going to rank for text that appeared on its site first....so its quite puzzling as to what criteria is really being used for identical content..

tedster

4:38 pm on Oct 1, 2007 (gmt 0)

That is an interesting observation, soapystar. Is the omitted tripadvisor page copied in full or just partial?

soapystar

5:09 pm on Oct 1, 2007 (gmt 0)

even stranger that i cant repeat it now....

its a partial copy..and partial copying is what seems to me a big problem area....

even stranger...sometimes you can pick a sentence and it will show the tripadvisor page first..but put that sentence in quotes and it says there are no pages...

tedster

6:04 pm on Oct 1, 2007 (gmt 0)

That sounds like the tripadvisor page is supplemental. You just described a typical symptom of supplementals: exact text searches not showing in the results.

[edited by: tedster at 7:41 pm (utc) on Oct. 1, 2007]

twtnyc

6:48 pm on Oct 1, 2007 (gmt 0)

i've taken to putting links back to my site with my keywords in the same color as the regular text in alot of places, so it's either a pain in the butt to remove or if they don't remove them, i at least have links back to me...

soapystar

10:07 pm on Oct 1, 2007 (gmt 0)

these are links back to that page or other pages?

outland88

12:08 am on Oct 2, 2007 (gmt 0)

i can search for various snippets from tripadvisor customer reviews that i found being used on another website and tripadvsior only shows if u click for the omitted results...now id have said tripadvisor was so heavyweight it is always going to rank for text that appeared on its site first....so its quite puzzling as to what criteria is really being used for identical content..

I had wondered about that since May and got a partial answer to this with some of the DC�s today. My site, which had never disappeared from Google, did. In its place were dozens of sites that had duplicated my content. This was very interesting in that I�m an expert in finding duplicate content but this junk surfaced out of the "middle of nowhere". It doesn�t appear in MSN or Yahoo and never did. Even more interesting is these sites utilized every conceivable dirty trick you can think of. A quick view revealed at least ten running site maps of my site and replacing my url�s with theirs. Also cloakers of every type, scuzz sites, previously unseen scrappers, and forums that posted entire stolen pages were to numerous to count.

Bottom line is Universal search is crawling the underbelly of the Internet and letting every huckster and schemer around into the results. Duplicate content, which might not have affected your site previously, may well do so in the future. Google�s better idea (ROFL) may soon increase a webmaster�s work even more in combating duplicate content.

soapystar

3:37 pm on Oct 2, 2007 (gmt 0)

sorry to hear that about your site Outland88.....

i understand the thoughts of many about strengthening your site but it still leaves the question open to my mind of when faced with sites with multiple identical content why so often its the original site that gets buried....without that answer you have no choice but to spend more and more of your time simply dealing with scrapers to reduce the identical copy out there...and to be honest so often the factor here is no about the strength of your site imho....so many low quality sites are riding on the back of scraped content...theres an unknown factor here somewhere..

some have said certain filters are only play for the top ranked sites..so while top sites can get binned into nowhere land it leaves the muck floating around the 40ish plus places...its one possible explanation..though it doesnt explain why the sites at the top get binned for their own content...you may say its not because of that but they probably get binned for other issues...yet so often when the scrapers are taken down those pages re-appear..

tedster

3:57 pm on Oct 2, 2007 (gmt 0)

I'm missing the connection between duplicate content and Universal Search - care to elaborate?

jk3210

7:13 pm on Oct 2, 2007 (gmt 0)

scuzz sites, previously unseen scrappers, and forums that posted entire stolen pages were to numerous to count

I always held Google in high regard, but after comparing Google's handling of these trash sites with Y! and MSN, I'd say Google is being completely overrun by scrapers and auto-gen'd trash.

Having such garbage in your index is one thing, but RANKING the trash pages number one for snippet searches while filtering out the CONTENT ORIGINATOR behind the "repeat search" function is truly laughable (for everyone except the people producing the content).

I find it very ironic that Google's "anti-spam" team is so touted within the industry, yet blackhat seo sites/tools are openly RECOMMENDING the use of Google Blog Search and Google News as scraping tools for both downloading the scraped content AND uploading the link-injected garbage.

But, what's most ironic of all is that I have to spend so much time sending DMCA notices to Google when at Y! and MSN the garbage pages are handled correctly by their algos.

nomis5

8:18 pm on Oct 2, 2007 (gmt 0)

I don't suffer from wholesale scraping, purely I believe by chance. I have around 2% of asp code on most pages. Those pesky little scrapers can't scrape asp and they don't have the time to work out if a basic scrape of standard page display would be worth scraping.

.asp pages seem fine for Google.

Marcia

10:35 pm on Oct 2, 2007 (gmt 0)

I've experienced having the entire written contents of a homepage scraped in full. That homepage can no longer be found for a search for its own site name - in quotes, even.

For a search on an exact quote, the scraper (a site subsidized by Adsense) will turn up, you have to click to see more results to see mine. This isn't uncommon at all, I'm seeing it a lot.

outland88

7:58 pm on Oct 3, 2007 (gmt 0)

Thing is the older the site the more time it has been out their to be scrapped. Likewise it seems some scrapers gather legitimacy themselves with aging. Also you may not see a full page scrape when Google is see-sawing their page and your�s in and out of the supplemental results.

Cutts was arguing sarcastically, in one of his posts, �what do you expect for free� but the last time I looked Adwords wasn�t free. Perhaps if Google incorporated some type of fee or expulsion for scraper Adsense sites then some of this copying mess might be choked off. There�s little threat of that though. Google profits too much from this mess.