Forum Moderators: open
From a legal perspect if there is solid proof that the competitor is infringing on copyright material seek legal advice.
From a business prospective, learn as much as you can from WMW both the good (what to do) and the bad (what not to do).
Set up a new site using all the GOOD quality information you learned and the old site, use it as a SPAM test site.
No more problem.
If the competitor actually figures it out I guarantee he will no longer trust your web designing skills and/or SEO abilities.
<<...and the old site, use it as a SPAM test site.
why do you give some tips like this one?
First SPAM is missleading anyone who use search engines, that includes you and me and him/her and it's very clear - you don't need to spam in any way to get decent listings. This forum should be used not to tell people to use SPAM but to explain what to do and what to avoid.
andy04031 I 100% agree with you. In actual fact my first statement covered kch333 question.
However, my solution is proportional - what would you currently call the other site, based on all the knowledge of the forum. A duplicate site, duplicate content, duplicate code.
Smell like spam, taste like spam, it already is spam!
So...
But in the future you may want to try replacing the page content with a re-direct and robot tag to unindex each page before you delete them.
This way you don't loose Search Engine visitors between the time SE drop the old pages and crawl the new ones
<html>
<head>
<meta HTTP-EQUIV="REFRESH" CONTENT="0;
URL=http://www.yournewdomain.com/index.html">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="ROBOTS" content="noindex,follow">
<title>title</title>
</head>
<body bgcolor="#FFFFFF" bgproperties="fixed" link="#000000" vlink="#000000" alink="#000000">
</body>
</html>
Even if they link to the same other sites there wouldn't be any commonality between them.
Google need a relationship to penalize
the difference - IP, incoming links, I can't believe the competitor would use the company.
I looked at the site but all chinese, PR1, no backlinks, no problem.
<<Both would be orphans, or at least within their own external link infrastructure. Even if they link to the same other sites there wouldn't be any commonality between them. Google need a relationship to penalize>>
Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....
I see your point. If you try to get them this way...I wouldn't put any time in it because the next copy is already in development...with the new one...
Actually I have a few copy "friends", one of them I sued a few months ago, it is a hassle and cost money and time, we won in the court but still we wasted time.
I have one special guy who duplicate my pages on dozens of his domains, he is doing that for months right now and my first reaction was not nice, believe me, I was trippin if you understand what I'm trying to say but after a few days of thinking I'm not feeding lawyers and judges anymore, I buy some other domains and create complete different pages but the same products, first I'm not depend only on one and second I get more knowledge to optimize my sites with different designs, at least I'm not getting stupid this way :)
<<However, my solution is proportional - what would you currently call the other site,
Smell like spam, taste like spam, it already is spam!
sure, but not yours...;)
Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....
actual Beachboy not to mix words, you said
perfectly acceptable to Googleand twice.
I said google must have a relationship to penalize, there's a difference.
Example: how can so many web sites get copied and not get PRO?
The copied doesn't usually link to the site it was stolen from, and the original owner doesn't know so he doesn't, and he certainly wouldn't link after finding out.
different IP's, different domain names, no linkage and not to forget no physical spam on the site, seem to be a couple of competitors or non-associated affiliates (would you think?)
Did you look at the site in question... you should?
Further, I don't recall any mention of what period of time had passed from the time a dupe page was installed on some other domain to the time it's noticed. And in an instance like that, had sufficient time passed to trip a spam filter?
Your conclusion is very interesting, to be sure. I wonder what degree of relationship there needs to be in order for a spam filter to trip a penalty when we are dealing with two identical sites, different domains, different IPs.
Further, do other search engines behave the same way Google does in these matters?
Theory:
How would Google compare EACH and EVERY site to EACH and EVERY other site? No way.
Support:
That's 2,073,418,204 pages compared to each other or...the combination possibilities are 2,149,531,523,302,580,000.(2 quintillion, 149 quadrillion, 531 trillion, 523 billion, 302 million ,580 thousand.)
If Google can do 10,000 comparisons in a second, it would take them 6,811,454.37 years to compare the 2 billion and some pages' possible combinations. Of course they would have to do it every crawl...
Did you look at the site in question... you should?
Ummm....Am I missing something? Unlesss Fathom knows something I don't , I don't see a "site" in question...I don't see any URLs in this entire thread...
actual Beachboy not to mix words, you said perfectly acceptable to Google
and twice.
Actually, Fathom, he was clearly trying to get a clarification of what you were saying, and in fact, he said (to take the full quote, not the snippet you chose) "So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google, in your view?"
Could that more clearly be a question? And yet you have turned it around and tried to make it look - to anyone who had not followed the thread - that it was a statement...and a misguided, ill-informed statement at that.
It is easy enough to misunderstand each other here without deliberately trying to misrepresent others' remarks, no?
Actually, the problem is not impossible. A search engine could store off
a small amount of data while spidering a page - say the byte count of the
page or perhaps a cyclic-redundancy check (CRC) code or a checksum.
Keeping it simple, let's say they just store off the byte count and the URL.
Build a list sorted by byte count, then compare the contents of only the
URLs having the same byte count. As soon as you get a single-character mismatch,
you quit. In this way, you do not need to compare every page to every page; The
problem is reduced in scope with a minor amount of front-end work.
Using several "compressed" comparison-values (CRC, LRC, checksum, byte count,
etc.) you can cut the number of required file compares even further, and even
begin to save enough time to consider some close-but-inexact matches.
Anyway, there are several shortcuts which help to reduce the problem. I don't
know if any SE actually does this, but they certainly could. Especially if
they focused on highly-competitive market segments where the duplicate problem
is known or likely to occur.
As to the duplicate-site problem, I've seen it stated in other threads here
that the SECOND site - the copy - gets ignored because the search engines are
well-aware that otherwise a competitor might copy your site with the intention
of causing you to be dropped from their index for duplicating content.
Jim
The question is, Which is the second site? When things went wrong around the turn of the year, it seemed to be random. Lots of people had problems with their sites and it was paradise for the search listing hijackers.
The general opinion seems to be that now, Google normally keep the one with the highest PageRank (like they used to).
Assume site one is in the U.S. and the person has their mother-in-law in the UK take the other domain name, with co.uk and using a slightly different keyword combination - with a hyphenated domain name. The purpose could be to promote the site for a keyword phrase with only a slight difference and promote through submitting to a different set of directories, etc.
With the exception of the domain name and maybe a tiny bit of text and page title for the second phrase, they could be practically identical, and Google would have them both.
Would it get by, if it were only 98% duplicated? And if they both went up at the same time, would they both stay, or would one be penalized or removed?
From a google perspective, if they are not linked together there is no concern.
From a legal perspect if there is solid proof that the competitor is infringing on copyright material seek legal advice.From a business prospective, learn as much as you can from WebmasterWorld both the good (what to do) and the bad (what not to do).
Set up a new site using all the GOOD quality information you learned and the old site, use it as a SPAM test site.
No more problem.
If the competitor actually figures it out I guarantee he will no longer trust your web designing skills and/or SEO abilities.
Since the competitor copied the first time it is likely he will keep an eye out for improvement. (assumption both kch333 and the competitor are chinese - I really can't see someone elsewhere using his data).
Although I couldn't read the text of kch333 web site "chinese" it doesn't seem to me that the competitor would have code and text verbatim to his site.
Otherwise the competitor would be promoting kch333. And that's better than promoting yourself alone!
All subsequent post are based on kch333 sticky site.
Beachboy
<<Are you sure? So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google, in your view. Do I understand you correctly>>
Beachboy:
Fathom said:
<<Both would be orphans, or at least within their own external link infrastructure. Even if they link to the same other sites there wouldn't be any commonality between them. Google need a relationship to penalize>>Beachboy: Hmmmm. So two identical pages or sites which are not interlinked to each other at all are perfectly acceptable to Google. It still seems to me that would trip a duplicate content filter. That's very interesting....
The reason I posed the situation in my previous post is because I think we need to get very clear on the point of mirror sites and duplicate pages, so that no one can possibly be encouraged to do anything that could place them at unnecessary risk.
Mirror sites are either perfectly all right with Google or...
Mirror sites and/or duplicate content represent a dangerous situation for getting banned or penalized.
It's either one or the other. And we have to keep in mind that Google is pretty darn good at finding relationships between sites.
There's a relevant discussion going on in our Content forum. From this thread:
[webmasterworld.com...]
WebGuerrilla:
The main thing you need to be concerned with is a duplicate content penalty. I know of a couple sites that use syndication as their primary marketing tool, and they ended up with a PR0 a few months back.I also talked to a Google rep awhile back who said that in the future, when finding dupe content on two domains, the one with the lowest PR would get penalized.
That could lead to a situation where having your article republished on a site with a higher PR could cause your site to get penalized.
WebGuerrilla is either right or he isn't. This is the issue we're trying to settle, to be able to deal with the situation we're discussing and try to solve kch333's dilemma.
[edited by: Marcia at 7:05 am (utc) on July 14, 2002]
Having said that, this issue of copied pages/sites is one I come up against almost every day and it does raise some interesting questions and possibly some concerns.
BeachBoy > It still seems to me that would trip a duplicate content filter.
CIML > The question is, Which is the second site?
If you know anything about me and my passion for languages and terminology, then you would understand that as I am building my directories I am hunting down pretty specific information. And time and time again I see the exact same information duplicated on totally separate and un-related domains. So I run into the dilemma of which is the original and which is the copy? The challenge is I don't know which is which.
So should I penalise any pages for having duplicate content? If I do I may be unwittingly penalising the original author and by default giving a boost to the perpetrator; not something I want to do. So I simply list both and rank them according to my own (secret) algorithm and let things lie until someone complains. So far no-body has. Of course what I would do if someone did complain is another dilemma to which I have as yet no answer.
And bear in mind that I am doing this personally and no via a program. So, if you try and equate the same problem to a program driven database, a la Google et al, then the problem becomes almost a nightmare.
As has been stated above it would be possible to identify exact duplicates using checksums and the likes, but how would an engine identify situations where information is duplicated but placed in a totally new page template? What is the percentage that trips the warning bells? Its it Marcia's 98% or is even that too limiting or forgiving? And if supposedly duplicate pages are found then should an engine dish out any penalties based on their findings? If so, to whom? Perhaps the duplicates are actually intentional being specifications for a product sold through various distributors all over the world. Then you could have 20, 30, or more copies of the same information intentionally put there for legitimate purposes. What then?
It is a nightmare problem and one that in my opinion does not have any easy answer. It could be equated to the supposed penalties for Guestbook Spamming reported earlier. What if a competitor spams guestbook in your name in an attempt to penalise your site by proxy?
And this seems to be the problem Frank faces. If someone copies your site, will there be a penalty and therefor is this another dirty trick available to the unscrupulous to undermine your efforts?
Following from Marcia's quote of WebGuerrilla quoting a Google Rep (phew) "I also talked to a Google rep awhile back who said that in the future, when finding dupe content on two domains, the one with the lowest PR would get penalized."
How can any engine make any decision on which page to penalise? How can they know which is the original? If they penalise and original page because a competitor page using copied information had better ranking, then could they find themselves in a tricky legal situation of giving credence to a perpetrator? I know we shy away from legal discussions here for those very same legal reasons, but it does create a very interesting dilemma for all concerned.
Frank, In my opinion I would not even worry about what the search engines are going to do. I would focus my efforts on getting the copy of your site taken off the web and thereby avoiding the possibility of any penalties altogether.
Onya
Woz
<typo!>
[edited by: Woz at 10:32 am (utc) on July 14, 2002]
it's an awful conclusion.
kch333