|Is Google All About Copyright Infringement?|
| 1:11 am on Apr 9, 2010 (gmt 0)|
This thought started from these two recent threads:
You can lose your rankings if others copy you [webmasterworld.com]
Viacom Finds Smoking YouTube Gun [webmasterworld.com]
I'm actually wondering if Goolge is possibly open to another YouTube style copyright infringement suit because they use methods, including PageRank and TrustRank, to determine the order of results when there is duplication, rather than origination?
Here's the thought:
If a 'small' (low PageRank 'PageRank reflects our view of the importance of web pages' [google.com...] TrustRank, etc.) site publishes a copyrighted work and another 'larger' (high PageRank TrustRank, etc.) site publishes the same material later and the page on the 'larger' site is displayed rather than the originator ('large' site's page replaces 'small' site's page in the results) how is it Google is not profiting from the display (use) of apparently infringing material?
If the smaller site published the content first and Google found it on the smaller site first, how is it not apparent infringement for another site to reproduce the material, especially without attribution (link, etc.)?
If there are advertisements on the results page, how is Google not profiting from their own promotion of the apparent infringement by replacing the original source with what they determine to be a 'more important' webpage for their visitors?
Since they determine the importance of pages without regard for the origination of the information, even in the event of duplication, and they profit by promoting their view of the importance of the pages they show in the results, how are they protected from infringement claims?
IMO It's certainly not by the DMCA...
|`(A)(i) does not have actual knowledge that the material or activity is infringing, |
`(ii) in the absence of such actual knowledge, is not aware of facts or circumstances from which infringing activity is apparent, or
`(iii) if upon obtaining such knowledge or awareness, the service provider acts expeditiously to remove or disable access to, the material;
`(B) does not receive a financial benefit directly attributable to the infringing activity, where the service provider has the right and ability to control such activity; and
`(C) in the instance of a notification of claimed infringement as described in paragraph (3), responds expeditiously to remove, or disable access to, the material that is claimed to be infringing or to be the subject of infringing activity.
When there is duplication between pages and sites, doesn't Google, by making their own determination on which page to display at the top of (or in) the results regardless of the origination of the content, use apparent copyright infringement for their own profit?
| 5:17 am on Apr 9, 2010 (gmt 0)|
Here's some more to the thought...
|Your work is under copyright protection the moment it is created and fixed in a tangible form that it is perceptible either directly or with the aid of a machine or device. |
When is my work protected? [copyright.gov]
For more info on the subject I found this chapter informative: Copyright Law Chapter 2 [copyright.gov]
If I discover Page A contains 'this content' and 'this content' is not already in my system, I can reasonably conclude Page A is the copyright holder (originator) of the content in the absence of direct, contradictory information.
If I later discover Page B contains 'this content' I can comply with the DMCA and copyright law by treating the secondary discovery as 'apparently infringing' and not including it in my results (and definitely not promoting it within my system), because I am proactively attempting to not profit from or promote what is reasonably (to me) apparently infringing work.
If I later discover Page C also contains 'this content' I can treat it in the same manner as Page B for the same reasons.
What I do not know how I could do is discover 'this content' on Page A originally and then replace it with Page B in the absence of direct, contradictory information regarding copyright ownership being supplied to me and claim I am not trying to profit from what should, to the best of my knowledge, be considered apparently infringing content?
If Page A contains 'this content' and the content is freely available, then I have done no one any harm or wrong by treating other Pages containing 'this content' as apparently infringing, because even though it's freely available and there is not infringement, I have provided my visitors with the resource they were seeking and the information they were seeking and they really do not need it in triplicate.
If Page A contains 'this content' and the content is copyrighted by the owner of Page A, then I have correctly treated it as the Copyright Holder's work and treated the other 'duplicates' (apparently infringing pages) correctly by proactively removing access to them to the best of my ability.
If Page A contains 'this content' and Page B is the true copyright holder of 'this content', then I have done what I should according to the DMCA to the best of my knowledge in the absence of contradictory information, and at the time of direct, contradictory information being provided and being made aware Page B's owner is the true Copyright Holder I can then replace Page A with Page B, having done exactly what the DMCA says I should when content is 'apparently infringing' and Page B's owner has no recourse, because I did as I should and was an unwilling participant in any type of infringement.
What I cannot figure out how I could do is remove Page A where 'this content' was initially discovered and replace it with the later discovered Page B also containing 'this content' without direct, contradictory information stating Page B is the true owner of the copyright to 'this content', because in my sole opinion Page B is 'more important' to myself or my visitors?
How can the algorithmically (heuristically) determined importance of a page (site) possibly determine the origination of the content on the page when the discovery date of the 'this content' on each page directly contradicts the perceived importance of the pages?
To say Page B should replace Page A, because Page B is 'more trusted' or 'more popular' or 'more expected' to be seen by either myself or visitors does not negate or change the fact 'this content' was originally discovered on Page A, not Page B and the owner of Page B has not provided any direct, contradictory information to outweigh the discovery date of the content on each page, and according to Copyright Law, the original creation of a work designates the copyright holder, not the 'algorithmically (heuristically) perceived importance of the site or page' to myself or my visitors. The DMCA basically says I must proactively remove or disable access to apparently infringing work to the best of my ability to qualify for protection.
So, how could anyone possibly remove the Page originally discovered containing 'this content' (Page A), replace it with Page B, and not be promoting and profiting from, what is to the best of their knowledge, apparently infringing content?
And, why would anyone remove the Page originally discovered containing 'this content' (Page A) and replace it with Page B if there was nothing for them to gain (profit from) by promoting what should reasonably be determined is, to the best of their knowledge, apparently infringing content?
I can't think of a good answer to either of those two questions, except there is a profit of some type from the promotion of what could (should IMO) reasonably and rationally be determined to be apparently infringing content...
| 6:42 am on Apr 9, 2010 (gmt 0)|
It sounds logical. Just two practical questions:
- must any news agency list for every released piece of text have a notice of the websites/URLs were their cont could be published?
- must I lose my copyright over my press releases or need I to sign any kind of agreement with the websites who want to publish these content I generated in their websites?
You should to find TheMadLawyer and to go against Google in court.
| 3:12 pm on Apr 9, 2010 (gmt 0)|
|- must any news agency list for every released piece of text have a notice of the websites/URLs were their cont could be published? |
Nope, as the SE, if the discovery of the content is considered the 'copyrighted' (original) and all other versions are treated as 'duplicates' (apparently infringing) it make no difference to you in anyway who republishes the information, because you've followed the rules to the best of your ability and considered the first discovered resource to be the original and therefore the copyright owner.
|- must I lose my copyright over my press releases or need I to sign any kind of agreement with the websites who want to publish these content I generated in their websites? |
Same deal as above... As far as which content to include in the results, IMO first-come-only-served is the right way to do it. It doesn't matter if your content can be reproduced or not, it's that 1.) It does not need to be reproduced (or included multiple times in search results) to make the web a better place. 2.) As the provider of the results and all those ads you really enjoy making all your money off you don't have to worry about using what IMO should be thought of as infringing material to do it.
My argument has nothing to do with the actual ownership of the content or the reproducibility of the content... It's all about how ownership should IMO be assigned by those who don't have any definitive information (SEs) on the true ownership of the content and I'm not sure what could be a better signal of ownership than the discovery date, since a work is protected by copyright at the time of production.
This has nothing to do with who can reproduce or not reproduce any give content or web page, and everything to do with the page shown in the results. IMO there is not a very good argument to be made for it not to be the page the content was originally discovered on by default.
THE QUESTION IS: How can a search engine promote a URL discovered to contain the same content as another URL they already have listed without blatant disregard for copyright law and losing any protection they might have from the DMCA?
IMO There is no way the 'other signals' they use to determine importance of a web page should outweigh discovery date of an original work WRT copyright ownership.
Doing the preceding is analogous to someone here reproducing these posts and WebmasterWorld arbitrarily, without any request or other information, replacing my user name with the user name of the second poster because they have an earlier join date and more posts, so their posts are determined to be 'more trusted' ... IMO Any reasonable person would think that's not only stupid, but wrong.
It seems fairly obvious to me, so if someone can explain how a search engine could possibly use anything other than initial discovery date to determine ownership of a work, please, explain it to me, because I don't get how they (SEs) can rightly, or legally (IMO), promote a URL containing reproduced content over, in place of, or in addition to, the URL the content was originally discovered on and not be trying to profit from what should (IMO) be considered, and treated as, apparently infringing content...
| 5:16 am on Apr 11, 2010 (gmt 0)|
There may be a loose shoelace in there somewhere though.
If the original site only gets spidered occasionally vs. the high PR site getting spidered several times a day.
They are more like to discover the content on the higher PR site first.
| 5:36 am on Apr 11, 2010 (gmt 0)|
Yep, and that would keep the rankings almost exactly as they are and also make it very easy for infringement to be reported and copyright ownership to be enforced. It's why I cannot see a very good reason for them to not do it the way I've suggested...
The sites people expect to see and have high PR would obviously be spidered more often, and they would likely be the site shown in the results as the originator of the content, but we would not have threads here about how a content originator has had their site replaced in the results by a content thief based on the PR (or other factors) of the two sites.
If another site with lower PR was truly the originator and the site with the higher PR site copied it and had it spidered on their site first (I'm not sure how they would find it before it was spidered and indexed unless they were directly monitoring the lower PR site) the lower PR site could simply file a DMCA complaint and they would only need to file it once to be ranked in the correct position using the method I suggested.
I cannot see why it's a bad way to do things, even though other members have said in other threads giving credit to the originator could cause more issues than it fixes. I cannot see how, because like you said, the sites they like to show soooooo much at the top of the results are spidered more frequently than others and IMO are usually the originators, or the content they reproduce is freely available, so my suggestion seems to help the 'little guy' out when they create original content which is then indexed and it does it without doing harm to the 'big sites' usually at the top of the results.
And, like I've said previously, IMO it's the right and legal way for them to handle the situation when there is duplication since they cannot possibly make a determination of origination that outweighs discovery based on the other factors they have access to...
Basically, what I've suggested keeps the 'little guy's' lower PR site from being replaced by the higher PR scraper and discourages copying, by not allowing a copy to be promoted, and especially not over an original discovery. How much less copying do you think there would be if people knew the copy would not rank regardless of PageRank or TrustRank or anything else? Personally, I think it would cut it down quite a bit...
I know I keep going on this, but how would a scraper or content duplicator know where to look for the new content GBot discovers before it's indexed and ranked? IMO GBot is most likely going to continue to be one of the most active bots on the 'net and will probably get to the content first, unless a scraper is directly monitoring and constantly scraping the site producing the content, especially with the methods, including pinging, to allow G to know there is new content published on a site... How is the scraper at all likely to get there first?
| 5:30 pm on Apr 11, 2010 (gmt 0)|
Of course not.
Google decides what you see.
| 7:44 pm on Apr 11, 2010 (gmt 0)|
LOL, so you're saying the answer to the question in the title of the thread is 'Yes' it seems... I've got to agree, and IMO they actually promote infringement by deciding which page is 'most important' and promoting it rather than showing the original for you to see (so you come back again). IOW: They do it for their own gain (profit)...
How does showing the page they determine to be the most important rather than the original not promote copying, scraping and content theft? I can't figure it out... Their apparent 'If we think your page is 'more important' (to our users?) we don't care who the original author is, we'll show yours instead. ' way of doing things certainly does not discourage copying, scraping, content theft, etc. and IMO encourages it.
| 8:24 pm on Apr 11, 2010 (gmt 0)|
IMO The saddest thing about this whole discussion is that it's a relatively simple thing for them to do... They could easily set a flag on duplicate URLs referencing the original URL and still use the duplicates for all their calculations and processing and scoring and everything else, but at the time of index generation they could swap in the original URL.
Their visitors would still see exactly the same content for all the same searches, but they would see the content at the original source rather than any copier's.
The information they provide access to in the results would not change, because setting a flag to indicate the original URL and replacing duplicates with the original after scoring and ranking basically assigns all weight and links and PR and TR and everything else from any duplicates to the original.
It doesn't even create a way for PR or TR or anything else to be 'manipulated' on a site where a single page of content is the original and the site is otherwise not very well ranked. It basically transfers the ranking factors to a single URL and does not change anything else. There is no need for other re-calculations to be done and it does not transfer any 'weight' to the site generally or on a whole. It simply puts the original source of the information in the results where duplicates would show up, so it only transfers any weight to a single page... It's not 'weight' or 'rankings' or 'authority' that could or would be passed to other pages on the site by the calculation process, because it would be transferred at the time of index generation and not 'cascade' anywhere else during the calculation process.
Visitors would still get access to the same content in the results, but they would see the content on the originating URL all the time. It's so simple and they don't even bother to do it... What a sad, blatant disregard and disrespect for content creators and the law (IMO) on their part... All they have to do is swap the original source URL in where the duplicate URL is ranked on the way out. It's simple to do.