How I report Plagiarism of my web site from our last webdesigner? - Webmaster General forum at WebmasterWorld - WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

How I report Plagiarism of my web site from our last webdesigner?

Solaris

9:12 pm on Aug 7, 2008 (gmt 0)

10+ Year Member

Hello:

I'm in a new job in a Real Estate Agency, I must build a new web site for this company, and working on the current web site I discovered that the last web designer copy all the listings that we have for sale and he make a website with all our properties.

He now have a better ranking for google and yahoo.

The problem is that this guy reply to all our mails saying that he is in Viena, or Austria, or Germany, etc. But several people tell me that he is in town. Obviously he is hiding from us.

Right now we are having a consult with our attorney. But in the mean time, I would like to know if there is a way to report his website to Google and Yahoo for decrease his ranking in this search engines...

Also I have the concern that if I report his website, google might think that i have a duplicate web site and take some bad actions against us.

Any one know how we can punish this guy in other ways...

I will really appreciate your help.

thanks - Solaris

[edited by: phranque at 12:40 am (utc) on Aug. 8, 2008]
[edit reason] specifics [/edit]

jdMorgan

9:32 pm on Aug 7, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

See Google's Digital Millennium Copyright Act [google.com] page.

The DMCA is the U.S. implementation of the WIPO Treaty, of which many nations are signatories.

You may file a DMCA infringement notice with the major search engines and with the infringer's hosting company. If the infringer is home-hosting, then file with his ISP.

Be aware that filing a fraudulent DMCA infringement notice is illegal; I strongly suggest discussing the DMCA infringement claims with your attorney before proceeding. Be especially clear on whether the contract with the original developer states that the original work was 'a work for hire' or if he/she retained some ownership rights before you proceed.

If your DMCA claim can be substantiated, the offending site (or pages on that site) will be removed until such time as the other party presents a plausible claim that no copyright violation exists.

Jim

HugeNerd

6:18 pm on Aug 8, 2008 (gmt 0)

10+ Year Member

Also I have the concern that if I report his website, google might think that i have a duplicate web site and take some bad actions against us.

I wouldn't worry too much about that. Google spiders will tell them all they need to know about content creation. Your site's first cache is bound to be much older than his and Google can identify the dates easily to determine whose content is the duplicate.

stapel

2:10 pm on Aug 14, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

HugeNerd said: ...Google can identify the dates easily to determine whose content is the duplicate.

Gracious! I certainly hope it doesn't work like that! Otherwise, any time an author revises his content, his site could be blacklisted by the search engines because the dates on his original (but updated) articles are now "newer" than the scrapers' copies of the originals!

Eliz.

HugeNerd

4:55 pm on Aug 14, 2008 (gmt 0)

10+ Year Member

any time an author revises his content, his site could be blacklisted by the search engines because the dates on his original (but updated) articles are now "newer" than the scrapers' copies of the originals!

Not true at all. The first cache of the original content remains -- that content and the date they were cached -- on file. I should think this works similarly to hardcopy: A printer may make several runs of a book over many years; the author has a new foreword, someone writes a new introduction, mistakes are corrected, etc. If you plagiarise content from the first printing, prior to the second run of the original work, the second run has not plagiarised the "scraper's" content. The copyright dates on the covers "prove" (I say "prove" because dates can be forged...)originality. Cache dates ought to work similarly as evidence of date of creation...unless you know otherwise?

Now, I am making some assumptions on the innerworkings of Google. For all I know they simply plug search terms into their engine and see who ranks higher...and decide based upon this result. Or, maybe they have some needlessly complex algorithm to calculate authenticity. I doubt they will be informing us of the exact procedure any time soon. I was merely applying Occam's Razor.

stapel

4:33 pm on Aug 15, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

HugeNerd said: The first cache of the original content remains...on file.

On what basis have you concluded that Google stores copies of all of the content it has ever crawled, keeping a "log" of the development of each page?

I can find online reference (including from Google's representatives) to Google's grouping highly-similar pages together, and picking one as being "representative" of all the members of the group. But I can find no reference to your process.

And I would hate to see a copied-but-older version being chosen to "represent my work", just because I'd corrected a typo in my original content and now have a "newer" timestamp on my (copy of the) file.

Eliz.

HugeNerd

6:49 pm on Aug 15, 2008 (gmt 0)

10+ Year Member

My conclusion is based upon (assumptions, extrapolations and conclusions from) a few copyright cases Google has faced regarding their caching policy. Here is an older but decent story from CNET:
[news.cnet.com...]

Unlike formal Web archive projects, Google says its cache feature does not attempt to create a permanent historical record of the Web. Rather, the company actively seeks to delete dead links; once a Web page disappears, the search engine seeks to purge that record and any related cached page as quickly as possible.
Still, Google's cached pages have proven to be a treasure trove for investigators seeking to recover data pulled from public Web sites. In one high-profile example, security and privacy expert Richard Smith copied Web pages detailing the backgrounds of Dr. John Poindexter, head of the Pentagon's Information Awareness Office (IAO), and other officials, from the Google cache days after they were removed from the IAO Web site. The pages were deleted after public reports surfaced on the office's development of a massive computer system to spy on Americans and potential terrorists.

I know the story claims that Google does not keep a permanent record of its cache...but its from 2003 and involves one of the earliest cases on caches. Google has had a few copyright suits filed over its cache (the story even mentions a specific suit Kelley v. Arriba Soft). Unfortunately, I no longer have an account which allows me access to Lexis-Nexis to find the legal briefings from any of the other cases. I believe one of them discussed Google keeping the cache on file, just not allowing the outdated caches to appear in the SERP. I would also check out Matt Cutt's blog which has some posts and information on caches but the office I am in right now won't allow me to view it.

You may also find this report on the Google Library Project to be of use:
[ala.org...]

The report discusses caching as Fair Use..and how maintaining copies of webpages, which has been deemed fair use, allows Google to cache copies of print media by extension. This means, as far as I am aware, that Google can archive its cache to its hearts content.

I also assume Google has a lot more information than they care to share with any of us. :o)

Do you have any insights into how Google would deliberate upon and rule in cases such as the OPs? My understanding is pure extrapolation and assumption, so if you know anything, please share!

As for:

And I would hate to see a copied-but-older version being chosen to "represent my work", just because I'd corrected a typo in my original content and now have a "newer" timestamp on my (copy of the) file.

Thus, I believe Google maintains outdated files in its cache. It may then access the first cache when the typo was still present and compare dates!

stapel

7:38 pm on Aug 15, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

CNet's article said: ...Richard Smith copied Web pages...from the Google cache days after they were removed from the [originating] Web site.

I'm not aware of Google's ever having claimed to keep their cache in "live" synch with the web. (I think it would be decidedly creepy if they even claimed that such were possible!)

One expects that there will be some time lag between changes being made to a given page on the originating web site and the spider's crawling (and thus Google's discovery) of that changed page. Hence, it seems reasonable that there should be a lag between the update and the adjustment of the cache. For low-ranking, rarely-crawled sites, the lag could extend into weeks or even months. I rather doubt that this lag in updating their cache somehow constitutes proof of a hidden permanent cache of all versions of all pages.

HugeNerd said: I know the story claims that Google does not keep a permanent record of its cache...but its from 2003 and....

I'm afraid I don't follow your logic here...? Google says it does not maintain an archive of past versions of pages, nor does it maintain copies of "dead" pages, but "since" they said this in 2003, we can safely assume they're lying...?

HugeNerd said: ...Google can archive its cache to its hearts content.

Yes, but... Why would Google waste precious server space on "results" that nobody will see, results that serve no useful purpose for Google?

HugeNerd said: Do you have any insights into how Google would deliberate upon and rule in cases such as the OPs?

On what basis do you believe that Google "rules" on copyrights...? Surely only courts can do that...?

As for prioritizing "hits" from their cache, I "know" only what I said before: Google representatives have posted a general outline of their process for prioritizing result, and I hope that the server time-stamp is not all that is used to make their determination.

One might note that many people have reported problems with scrapers ranking higher than the originating pages. These reports would seem to indicate that determining priority is not a matter of Google studiously comparing hundreds of pages from many years' caches and carefully deliberating precedence. More likely, they use the methods that they say they use, leading sometimes, as one would then reasonably expect, to MFA scraper sites showing up "higher" than the original authors' sites.

HugeNerd said: Thus, I believe Google maintains outdated files in its cache.

I say that I hope Google doesn't rely only on server time-stamps, and "thus" you believe Google effectively maintain a "mirror" (and then some) of the Internet Archive"...? I'm sorry, but I don't follow...?

It should be noted that derivative works, by their very nature, are often difficult to "judge". This is part of why copyright court cases can be so expensive and protracted.

Personally, I can't imagine there being an automated process for this, nor can I imagine Google investing the multiplied thousands of man-hours necessary to attempt to make this determination "by hand". Instead, I think Google probably use the methods that they say they use, or at least something similar.

I could be wrong, of course....

Eliz.

HugeNerd

8:50 pm on Aug 15, 2008 (gmt 0)

10+ Year Member

I'm afraid I don't follow your logic here...? Google says it does not maintain an archive of past versions of pages, nor does it maintain copies of "dead" pages, but "since" they said this in 2003, we can safely assume they're lying...?

It is not lying if Google has changed their policy. The policy has been, and is still being, fleshed out as cases make it to ever higher courts. What about the court cases regarding the Google Print initiative whereby Google won the right to cache copyrighted material under fair use? These cases and their rulings occured since 2003. This means that Google's statement, taken in 2003, is no longer applicable in 2008. In 2003, Google was not scanning and archiving the Library of Congress; was not trying to archive as much information as they could store.

On what basis do you believe that Google "rules" on copyrights...? Surely only courts can do that...?

On no basis do I believe that Google produces legal rulings. However, in response to the OP seeking to have Google remove the scraper's page from the SERPs...Google does indeed decide. Especially if he appeals via the DMCA. I'll go back and edit my post for semantics later.

Yes, but... Why would Google waste precious server space on "results" that nobody will see, results that serve no useful purpose for Google?

Would information serve no use to Google? What about the exact scenario we are discussing? Why would Google want to waste server space archiving copies of old journal articles? of bibliographies for books which no longer exist in complete form? of briefs from incomplete manuscripts? Who am I to say! Ask Google Print...

Google representatives have posted a general outline of their process for prioritizing result, and I hope that the server time-stamp is not all that is used to make their determination.

I don't believe I even suggested it was the only criteria; merely, it is a possible and readily accessible source for dating content creation:

For all I know they simply plug search terms into their engine and see who ranks higher...and decide based upon this result. Or, maybe they have some needlessly complex algorithm to calculate authenticity. I doubt they will be informing us of the exact procedure any time soon. I was merely applying Occam's Razor.

Occams Razor meaning I applied the simplest, least complex explanation...

These reports would seem to indicate that determining priority is not a matter of Google studiously comparing hundreds of pages from many years' caches and carefully deliberating precedence.

You're absolutely correct. Determining priority is not something Google does actively...unless you file with the DMCA as jdMorgan suggested. Doing so causes Google to review the accusations and decide (rule?) in some manner. I believe the result is deprioritization or even delisting within the SERPs...and that cache dates likely play a role in Google's determination.

"thus" you believe Google effectively maintain a "mirror" (and then some) of the Internet Archive"...? I'm sorry, but I don't follow...?

The DMOZ/ODP and Google Index are one in the same:

Instead, I think Google probably use the methods that they say they use, or at least something similar.

Can you please direct me to the portion of the Google DMCA where they discuss the procedures and methods for determining originality? If you have access to such information, please share. I cannot find any mention within Google...

[edited by: phranque at 9:26 pm (utc) on Aug. 15, 2008]
[edit reason] No urls, please. See TOS [webmasterworld.com] [/edit]

Syzygy

10:57 pm on Aug 15, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Right now we are having a consult with our attorney.

I'm presuming that your consultation was a success and that armed now with professional legal advice you have a course of action that will enable you to

...punish this guy...

Syzygy

farmboy

5:42 pm on Aug 16, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Any one know how we can punish this guy in other ways...

There is an old saying something along the lines of, "He who seeks vengeance first digs two graves"

Instead of focusing your energy on trying to hurt this guy somehow, you might consider what damages you're realizing and work to stop those.

Real estate listings expire over time and soon what he has copied will be of no benefit. He will then have no content or he will have to copy new listings you create.

I'd do my research and find out if you can copyright your listings. If Yes, he is or will be in violation and you can take the appropriate action.

By the way, how is this other guy profiting by having a site with your real estate listings on it?

FarmBoy