If my compititor largely duplicate our website?

Forum Moderators: open

Message Too Old, No Replies

If my compititor largely duplicate our website?

it's said this would be spammed!

kch333

1:15 pm on Jul 12, 2002 (gmt 0)

It's said that if google find many web page have the same data, then, they would kick it out by their search.

But if somebody copy my web, and duplicate it and duplicate....

what would happen ?

frank

Marcia

11:52 am on Jul 14, 2002 (gmt 0)

>How can any engine make any decision on which page to penalise? How can they know which is the original?
>penalise an original page because a competitor page using copied information had better ranking

Interesting points, Woz. Aside from any other considerations, they've displayed enough ethics and concern for the integrity of their search so that it's doubtful they'd want to be doing that.

If the cache were completely archived by Google indefinitely it would take an astronomical amount of storage, but it isn't inconceivable that it's archived for at least a limited amount of time. We've seen cached pages switched back and forth more than once. They couldn't get involved in disputes, but it's possible there is some documentation available to them, at least for some amount of time.

Woz

12:22 pm on Jul 14, 2002 (gmt 0)

That is exactly my point. No engine would want to get into a legal battle over who is the copyright holder and therefor why did they penalise the copyright holder in favour of the perpetrator. Therefor I would imagine they would refrain from such activities. Yet this would seems to fly in the face of WG's quote above.

Slightly OT but I really don't see the sense of handing out penalties under any circustances as the web by its very nature is a mixture of Companies and People from Professionals to Mom and Pops, from Highly Ethical Netizens to the same Lo-Life we have in real life. But who is to say who is which?

It seems to me that a modicum of restraint and perhaps negation of any percieved value or gain due to what could perhaps be questionable activities would be the better path to follow.

Or, to put it more simply,

"Is that Deliberate Spam"?.

Thats a "Definite Possible Maybe".

Onya
Woz

Mardi_Gras

2:42 pm on Jul 14, 2002 (gmt 0)

So Beachboy's question - So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google? remains unanswered?

Marcia

2:51 pm on Jul 14, 2002 (gmt 0)

Mardi_Gras, I think by that question Beachboy was alluding to a prior remark that was made, trying to clarify whether that's what had been meant.

Mardi_Gras

3:43 pm on Jul 14, 2002 (gmt 0)

No, I understand that context, Marcia - I just thought it remained relevant to the overall discussion in general (and my curiousity in particular) and I was wondering if there is some definitive answer.

fathom

5:57 pm on Jul 14, 2002 (gmt 0)

Although alot of changes can occur in a year, discussions here tend to be somehat similar. Mirrors March 2001 [webmasterworld.com]

For quite a while a client's web master would post a page(s) "quickly".

Then decide to move the page(s) to a more appropriate page name/directory not realizing visitors already linked to the page.

On recognizing this error he put them back (and also left the page(s) in the more appropriate place.

Hundreds of these (dup/mirror) pages were strewn throughout their site and google indexed many of them.

Although a re-direct and no unindexing may have been the a more appropriate solution... they did not get penalized, ever.

Two different sites have alot less in common?

Our measure of success and failure here, are based solely of observations and myth and rarely facts.

So a mirror site, separate domain, not linked at all with the primary site, is fully acceptable to Google

IMO yes - simply because in many cases googlebot can't see all the similarities.

NFFC

8:42 pm on Jul 14, 2002 (gmt 0)

I really can't think of any way in which duplicate content can be a good thing. We spend a lot of time getting "copied" content of ours removed as I think it can/has/will damage our domains.

As an SEO it is important to take as much control over how your content is presented to the SE's, I don't like to leave it up to whatever dupe detection/removal algo they decide on this month.

fathom

9:07 pm on Jul 14, 2002 (gmt 0)

I do agree here, it unwise to mirror, but the fact remains, short of human intervention it's unlike googlebot itself will detect unrelated mirrors and in many cases related ones as well.

fathom

12:01 am on Jul 15, 2002 (gmt 0)

Though "The WWW Virtual Library" mirrors are 100% duplicate sites and enjoy excellent PR and SERP's

Though I would think this is an exception to the topic.

[vlib.org...]

[cui.unige.ch...]

vitaplease

11:30 am on Jul 15, 2002 (gmt 0)

On duplicate content and Google:

- the highest Pagerank is no option. This has nothing to do with "who was first" even if Google would take into consideration things as the "age of links" see the Google programming contest"
[google.com...]

- archiving all cache versions would be better, however what if you copy and "web-publish" parts of an article that has never been on-line, and the original article becomes on-line a year later?

Example:
A trainee in my company used some sentences from a Reference book (previously not available on-line) on a webpage. We got a remark from a surfer that it contained copied sentences without source from a specific book. We corrected and mentioned the source, however, we also finally found out that that specific book the surfer had mentioned had copied those sentences from the book our trainee had used, or vice-versa...

Even off-line it is an impossible exercise to know who really was first.

ciml

1:26 pm on Jul 15, 2002 (gmt 0)

> Even off-line it is an impossible exercise to know who really was first.

Exactly. Googlebot isn't in a position to make value judgements about which address derserves to be in its index for a particular piece of content. Also, Google don't want to list ten copies of the same content for each search (like a popular engine of old often used to).

It's up to site owners to resolve problems with duplication, and it's up to Google to return relevant content for each search.

Their have been serious problems with duplicate content in the past (including large scale result hijacking), but the 'keep the version with the highest PageRank' approach seems to work pretty well.

Woz

8:51 am on Jul 16, 2002 (gmt 0)

Just out of interest, whilst hunting around just now I came across a site with an exact duplicate on a variant domain.

And not only are they exact copies, but they are done in Frontpage with common borders, themes, java buttons, all the things we say hurt rankings.

Both sites are PR7!

Checking the back links they both have 474 links with exactly the same linkees. Which tends to suggest that they are mapping the two domains to the same files.

I am not sure if that proves or disproves any theories but thought I would throw it in as I found it quite surprising PR7 was attainde against all the odds.

Onya
Woz

ciml

11:12 am on Jul 16, 2002 (gmt 0)

Woz, that's exactly what happens when Google recognises duplicates. Links to both URLs count towards just one.

If you check the top of the Google cache page for each URL you'll see that Googe counts them as the same.

Marcia

11:57 am on Jul 16, 2002 (gmt 0)

So both names are getting credit for the total links and both have PR7?

ciml

12:42 pm on Jul 16, 2002 (gmt 0)

Not exactly Marcia, in the duplicate scenario the two listings are merged into one. The Toolbar shows a PR for both, but only one address remains in Google (hence the cache check).

Some people see this as a penalty, but I'd much rather have one URL in Google credited with both sets of inbound links than two URLs credited with half each. This is absolutely the best thing to happen if you have duplicate or near duplicate content.

(I'm assuming that Woz is seeing the 'duplicate' effect, as it fits his description nicely.)

This 45 message thread spans 2 pages: 45