If my compititor largely duplicate our website?

Forum Moderators: open

Message Too Old, No Replies

If my compititor largely duplicate our website?

it's said this would be spammed!

kch333

1:15 pm on Jul 12, 2002 (gmt 0)

It's said that if google find many web page have the same data, then, they would kick it out by their search.

But if somebody copy my web, and duplicate it and duplicate....

what would happen ?

frank

Dinkar

1:46 pm on Jul 12, 2002 (gmt 0)

No problem. Let them do it. It may help you to sell your products/servies.

fathom

1:55 pm on Jul 12, 2002 (gmt 0)

From a google perspective, if they are not linked together there is no concern.

From a legal perspect if there is solid proof that the competitor is infringing on copyright material seek legal advice.

From a business prospective, learn as much as you can from WMW both the good (what to do) and the bad (what not to do).

Set up a new site using all the GOOD quality information you learned and the old site, use it as a SPAM test site.

No more problem.

If the competitor actually figures it out I guarantee he will no longer trust your web designing skills and/or SEO abilities.

andy04031

2:10 pm on Jul 12, 2002 (gmt 0)

fathom

<<...and the old site, use it as a SPAM test site.
why do you give some tips like this one?
First SPAM is missleading anyone who use search engines, that includes you and me and him/her and it's very clear - you don't need to spam in any way to get decent listings. This forum should be used not to tell people to use SPAM but to explain what to do and what to avoid.

Knowles

2:12 pm on Jul 12, 2002 (gmt 0)

Andy I think what he was meaning was to use it as a spam site for the other people to copy the spam. In that aspect I think its a very unique way to teach another webmaster a lesson for stealing content and code.

fathom

2:41 pm on Jul 12, 2002 (gmt 0)

Actually the worst thing is trying to decide if it's worth your time, money and effort to commence legal action and most people don't follow through with this course of action simply because they don't have the resources (all of the above).

andy04031 I 100% agree with you. In actual fact my first statement covered kch333 question.

However, my solution is proportional - what would you currently call the other site, based on all the knowledge of the forum. A duplicate site, duplicate content, duplicate code.

Smell like spam, taste like spam, it already is spam!

So...

stace

3:06 pm on Jul 12, 2002 (gmt 0)

I recently redesigned my distributor's website, so far about 120 pages worth, and for a few weeks there were duplicate pages on his site and mine as we were transferring and getting approval for the new pages. My site had links to his old site, but his site has no links to mine. I just deleted the entire folder off of my server just to be safe, but googlebot already got wind of all of these pages. Should I be worried?

fathom

3:28 pm on Jul 12, 2002 (gmt 0)

No worries!

But in the future you may want to try replacing the page content with a re-direct and robot tag to unindex each page before you delete them.

This way you don't loose Search Engine visitors between the time SE drop the old pages and crawl the new ones

<html>
<head>
<meta HTTP-EQUIV="REFRESH" CONTENT="0;
URL=http://www.yournewdomain.com/index.html">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="ROBOTS" content="noindex,follow">
<title>title</title>
</head>

Beachboy

5:03 pm on Jul 12, 2002 (gmt 0)

<<From a google perspective, if they are not linked together there is no concern.>>

Fathom,

Are you sure? So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google, in your view. Do I understand you correctly?

fathom

5:18 pm on Jul 12, 2002 (gmt 0)

both would be orphans, or at least within their own external link infrastructure.

Even if they link to the same other sites there wouldn't be any commonality between them.

Google need a relationship to penalize

the difference - IP, incoming links, I can't believe the competitor would use the company.

I looked at the site but all chinese, PR1, no backlinks, no problem.

ann

5:29 pm on Jul 12, 2002 (gmt 0)

I recently had a similar problem but when I threatened to go to their ISP and get their site dropped then they took it down...

Have you tried that approach?

Ann

Beachboy

6:16 pm on Jul 12, 2002 (gmt 0)

Fathom said:

<<Both would be orphans, or at least within their own external link infrastructure. Even if they link to the same other sites there wouldn't be any commonality between them. Google need a relationship to penalize>>

Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....

andy04031

1:08 am on Jul 13, 2002 (gmt 0)

fathom

I see your point. If you try to get them this way...I wouldn't put any time in it because the next copy is already in development...with the new one...

Actually I have a few copy "friends", one of them I sued a few months ago, it is a hassle and cost money and time, we won in the court but still we wasted time.
I have one special guy who duplicate my pages on dozens of his domains, he is doing that for months right now and my first reaction was not nice, believe me, I was trippin if you understand what I'm trying to say but after a few days of thinking I'm not feeding lawyers and judges anymore, I buy some other domains and create complete different pages but the same products, first I'm not depend only on one and second I get more knowledge to optimize my sites with different designs, at least I'm not getting stupid this way :)

<<However, my solution is proportional - what would you currently call the other site,
Smell like spam, taste like spam, it already is spam!
sure, but not yours...;)

fathom

1:29 am on Jul 13, 2002 (gmt 0)

Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....

actual Beachboy not to mix words, you said

perfectly acceptable to Google

and twice.

I said google must have a relationship to penalize, there's a difference.

Example: how can so many web sites get copied and not get PRO?

The copied doesn't usually link to the site it was stolen from, and the original owner doesn't know so he doesn't, and he certainly wouldn't link after finding out.

different IP's, different domain names, no linkage and not to forget no physical spam on the site, seem to be a couple of competitors or non-associated affiliates (would you think?)

Did you look at the site in question... you should?

Beachboy

4:00 am on Jul 13, 2002 (gmt 0)

Fathom, I'm just trying to understand. Truth is, I have about zero experience where pages have been duplicated elsewhere, therefore I don't know their fate.

Further, I don't recall any mention of what period of time had passed from the time a dupe page was installed on some other domain to the time it's noticed. And in an instance like that, had sufficient time passed to trip a spam filter?

Your conclusion is very interesting, to be sure. I wonder what degree of relationship there needs to be in order for a spam filter to trip a penalty when we are dealing with two identical sites, different domains, different IPs.

Further, do other search engines behave the same way Google does in these matters?

Tapolyai

4:39 am on Jul 13, 2002 (gmt 0)

Assumptions
Two identical sites with the exception of different domain names, IP addresses, and do not have links to each other.

Theory:
How would Google compare EACH and EVERY site to EACH and EVERY other site? No way.

Support:
That's 2,073,418,204 pages compared to each other or...the combination possibilities are 2,149,531,523,302,580,000.(2 quintillion, 149 quadrillion, 531 trillion, 523 billion, 302 million ,580 thousand.)

If Google can do 10,000 comparisons in a second, it would take them 6,811,454.37 years to compare the 2 billion and some pages' possible combinations. Of course they would have to do it every crawl...

Mardi_Gras

4:51 am on Jul 13, 2002 (gmt 0)

Fathom wrote:

Did you look at the site in question... you should?

Ummm....Am I missing something? Unlesss Fathom knows something I don't , I don't see a "site" in question...I don't see any URLs in this entire thread...

actual Beachboy not to mix words, you said perfectly acceptable to Google

and twice.

Actually, Fathom, he was clearly trying to get a clarification of what you were saying, and in fact, he said (to take the full quote, not the snippet you chose) "So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google, in your view?"

Could that more clearly be a question? And yet you have turned it around and tried to make it look - to anyone who had not followed the thread - that it was a statement...and a misguided, ill-informed statement at that.

It is easy enough to misunderstand each other here without deliberately trying to misrepresent others' remarks, no?

jdMorgan

6:45 am on Jul 13, 2002 (gmt 0)

Tapolyai,

Actually, the problem is not impossible. A search engine could store off
a small amount of data while spidering a page - say the byte count of the
page or perhaps a cyclic-redundancy check (CRC) code or a checksum.

Keeping it simple, let's say they just store off the byte count and the URL.
Build a list sorted by byte count, then compare the contents of only the
URLs having the same byte count. As soon as you get a single-character mismatch,
you quit. In this way, you do not need to compare every page to every page; The
problem is reduced in scope with a minor amount of front-end work.

Using several "compressed" comparison-values (CRC, LRC, checksum, byte count,
etc.) you can cut the number of required file compares even further, and even
begin to save enough time to consider some close-but-inexact matches.

Anyway, there are several shortcuts which help to reduce the problem. I don't
know if any SE actually does this, but they certainly could. Especially if
they focused on highly-competitive market segments where the duplicate problem
is known or likely to occur.

As to the duplicate-site problem, I've seen it stated in other threads here
that the SECOND site - the copy - gets ignored because the search engines are
well-aware that otherwise a competitor might copy your site with the intention
of causing you to be dropped from their index for duplicating content.

Jim

ciml

10:24 am on Jul 13, 2002 (gmt 0)

I think this all revolves around your last paragraph, Jim.

The question is, Which is the second site? When things went wrong around the turn of the year, it seemed to be random. Lots of people had problems with their sites and it was paradise for the search listing hijackers.

The general opinion seems to be that now, Google normally keep the one with the highest PageRank (like they used to).

Marcia

2:08 pm on Jul 13, 2002 (gmt 0)

I think the critical issue is duplicate content altogether. What if it isn't a competitor? What if it's a person with a well designed site, let's assume it's selling products, who duplicates the site, takes space with another web host, and puts up the identical site with a different domain name. They can even take the other domain name using a different person's name and whois information.

Assume site one is in the U.S. and the person has their mother-in-law in the UK take the other domain name, with co.uk and using a slightly different keyword combination - with a hyphenated domain name. The purpose could be to promote the site for a keyword phrase with only a slight difference and promote through submitting to a different set of directories, etc.

With the exception of the domain name and maybe a tiny bit of text and page title for the second phrase, they could be practically identical, and Google would have them both.

Would it get by, if it were only 98% duplicated? And if they both went up at the same time, would they both stay, or would one be penalized or removed?

fathom

4:59 pm on Jul 13, 2002 (gmt 0)

I still say my first post applies...

From a google perspective, if they are not linked together there is no concern.
From a legal perspect if there is solid proof that the competitor is infringing on copyright material seek legal advice.
From a business prospective, learn as much as you can from WebmasterWorld both the good (what to do) and the bad (what not to do).
Set up a new site using all the GOOD quality information you learned and the old site, use it as a SPAM test site.
No more problem.
If the competitor actually figures it out I guarantee he will no longer trust your web designing skills and/or SEO abilities.

Since the competitor copied the first time it is likely he will keep an eye out for improvement. (assumption both kch333 and the competitor are chinese - I really can't see someone elsewhere using his data).

Although I couldn't read the text of kch333 web site "chinese" it doesn't seem to me that the competitor would have code and text verbatim to his site.

Otherwise the competitor would be promoting kch333. And that's better than promoting yourself alone!

All subsequent post are based on kch333 sticky site.

kch333

7:06 pm on Jul 13, 2002 (gmt 0)

Between search and spam, google seems to give us third choice: "repeat the search with the omitted results included" in the end of search list. I find some pages were searched only by this way.

I am sorry that I know only Chinese and I cannot give more example.

frank

fathom

7:20 pm on Jul 13, 2002 (gmt 0)

hi frank I'm curious, is this Google.com or www.google.ch

kch333

7:38 pm on Jul 13, 2002 (gmt 0)

I type www.google.com and the Chinese MS operation autoly give Google Chinese version., If I want to use english data, I have to reset it by entering "preference" to choose English.

Chinese owned about 80,000,000 PC, google invest more in this market already.

kch333

fathom

8:50 pm on Jul 13, 2002 (gmt 0)

Do you have any web tools that shows what others in your region use to find information online.

(e.g. - Chinese (Simplified) or Chinese (Traditional))

The bulk of the web (archived in search engines) is english however, other languages are quickly filling the void.

kch333

5:22 am on Jul 14, 2002 (gmt 0)

Fathom:

ok, let's forget Taiwan Chinese or Mainland Chinese. it's not the point.

the point is : by my observation, google moved the questioned page to the result that we have to click again. Do you agree that? or you have other thought ?
( I mean the different domain's pages appeared there)

frank

Marcia

6:31 am on Jul 14, 2002 (gmt 0)

Fathom
<<From a google perspective, if they are not linked together there is no concern.>>

Beachboy
<<Are you sure? So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google, in your view. Do I understand you correctly>>

Beachboy:

Fathom said:
<<Both would be orphans, or at least within their own external link infrastructure. Even if they link to the same other sites there wouldn't be any commonality between them. Google need a relationship to penalize>>
Beachboy: Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....

The reason I posed the situation in my previous post is because I think we need to get very clear on the point of mirror sites and duplicate pages, so that no one can possibly be encouraged to do anything that could place them at unnecessary risk.

Mirror sites are either perfectly all right with Google or...
Mirror sites and/or duplicate content represent a dangerous situation for getting banned or penalized.

It's either one or the other. And we have to keep in mind that Google is pretty darn good at finding relationships between sites.

There's a relevant discussion going on in our Content forum. From this thread:
[webmasterworld.com...]

WebGuerrilla:

The main thing you need to be concerned with is a duplicate content penalty. I know of a couple sites that use syndication as their primary marketing tool, and they ended up with a PR0 a few months back.
I also talked to a Google rep awhile back who said that in the future, when finding dupe content on two domains, the one with the lowest PR would get penalized.
That could lead to a situation where having your article republished on a site with a higher PR could cause your site to get penalized.

WebGuerrilla is either right or he isn't. This is the issue we're trying to settle, to be able to deal with the situation we're discussing and try to solve kch333's dilemma.

[edited by: Marcia at 7:05 am (utc) on July 14, 2002]

Beachboy

7:05 am on Jul 14, 2002 (gmt 0)

Yep, that would be seriously unfortunate for the creator/original publisher of that content.

Woz

7:54 am on Jul 14, 2002 (gmt 0)

I think we got side tracked with the Chinese content here for a while. With this subject I think it is probably better to keep it general. If a site is copied then what language it is in doesn�t really matter. Frank, if you have some specific Chinese questions you could try the Asia-Pacific Forum [webmasterworld.com].

Having said that, this issue of copied pages/sites is one I come up against almost every day and it does raise some interesting questions and possibly some concerns.

BeachBoy > It still seems to me that would trip a duplicate content filter.
CIML > The question is, Which is the second site?

If you know anything about me and my passion for languages and terminology, then you would understand that as I am building my directories I am hunting down pretty specific information. And time and time again I see the exact same information duplicated on totally separate and un-related domains. So I run into the dilemma of which is the original and which is the copy? The challenge is I don't know which is which.

So should I penalise any pages for having duplicate content? If I do I may be unwittingly penalising the original author and by default giving a boost to the perpetrator; not something I want to do. So I simply list both and rank them according to my own (secret) algorithm and let things lie until someone complains. So far no-body has. Of course what I would do if someone did complain is another dilemma to which I have as yet no answer.

And bear in mind that I am doing this personally and no via a program. So, if you try and equate the same problem to a program driven database, a la Google et al, then the problem becomes almost a nightmare.

As has been stated above it would be possible to identify exact duplicates using checksums and the likes, but how would an engine identify situations where information is duplicated but placed in a totally new page template? What is the percentage that trips the warning bells? Its it Marcia's 98% or is even that too limiting or forgiving? And if supposedly duplicate pages are found then should an engine dish out any penalties based on their findings? If so, to whom? Perhaps the duplicates are actually intentional being specifications for a product sold through various distributors all over the world. Then you could have 20, 30, or more copies of the same information intentionally put there for legitimate purposes. What then?

It is a nightmare problem and one that in my opinion does not have any easy answer. It could be equated to the supposed penalties for Guestbook Spamming reported earlier. What if a competitor spams guestbook in your name in an attempt to penalise your site by proxy?

And this seems to be the problem Frank faces. If someone copies your site, will there be a penalty and therefor is this another dirty trick available to the unscrupulous to undermine your efforts?

Following from Marcia's quote of WebGuerrilla quoting a Google Rep (phew) "I also talked to a Google rep awhile back who said that in the future, when finding dupe content on two domains, the one with the lowest PR would get penalized."

How can any engine make any decision on which page to penalise? How can they know which is the original? If they penalise and original page because a competitor page using copied information had better ranking, then could they find themselves in a tricky legal situation of giving credence to a perpetrator? I know we shy away from legal discussions here for those very same legal reasons, but it does create a very interesting dilemma for all concerned.

Frank, In my opinion I would not even worry about what the search engines are going to do. I would focus my efforts on getting the copy of your site taken off the web and thereby avoiding the possibility of any penalties altogether.

Onya
Woz

<typo!>

[edited by: Woz at 10:32 am (utc) on July 14, 2002]

kch333

8:54 am on Jul 14, 2002 (gmt 0)

I am sure that google are filtering the duplicated pages even they have different design and different size, or he must do so, because the copy-archieves get more and more ( especially in China).
We are a big company in our field, we want to put some original text to website, but this would be in vain finally. So, as to now, I stop it and I am considering to copy others.(maybe they copy from magazine or anyotherwhere)

it's an awful conclusion.

kch333

This 45 message thread spans 2 pages: 45