Forum Moderators: not2easy

Message Too Old, No Replies

Definition of "fair use"

are 'sewer sites' fair use?

         

blaze

9:18 pm on Jun 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some thoughts on how to make a 'sewer site' legal within the definition of 'fair use'

First, read this article at Standford on Fair Use [fairuse.stanford.edu]

You might also want to read
Website Permissions [fairuse.stanford.edu] also at standford.

Also read through these threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]

I quote below from the standford article. I cut out quite a bit so you would be wise to read the original article.


The four factors judges consider are:

1. the purpose and character of your use
2. the nature of the copyrighted work
3. the amount and substantiality of the portion taken, and
4. the effect of the use upon the potential market.

1. is the primary factor. The following questions are asked:


a. Has the material you have taken from the original work been transformed by adding new expression or meaning?

b. Was value added to the original by creating new information, new aesthetics, new insights and understandings?

Some possible answers:

a) No, none that I can see though perhaps a good scraper site could add further commentary on the quote by scoring it and providing some kind of 'added value' such as reviews.

b) This is the one argument which may make it legal:
- we see relevant websites to certain keywords ..
- a naive judge could be convinced that 'sewer sites' are for guiding the user, not for giving the user quality content stolen from quality websites
- alternative advertisements are added so we can follow up on related companies


2. The Nature of the Copyrighted Work

you have more leeway to copy from factual works .. such as biographies ..

stronger case .. if the material copied is from a published work .. The scope of fair use is narrower for unpublished works because an author has the right to control the first public appearance of his expression.

The works sewer sites quote from are generally factual and always published .. so this certainly helps the case rather than harms it.


3. The Amount and Substantiality of the Portion Taken

The less you take, the more likely that your copying will be excused as a fair use. However, even if you take a small portion of a work, your copying will not be a fair use if the portion taken is the "heart" of the work.

Since we are taking the parts with the relevant keywords, I think a sophisticated judge would realize that this is the 'heart' of the content. This, I believe would harm the case.


4. The Effect of the Use Upon the Potential Market

Another important fair use factor is whether your use deprives the copyright owner of income or undermines a new
or potential market for the copyrighted work.

This arguable. I would lean in favor of saying yes, you are undermining the author .. but a case could be made to say that you are enabling the users to find the content they are interested in more quickly and they are likely to find your content faster by the use of these 'sewer sites'.

A sophisticated judge would have to given a good lesson on SEO.


5. The "Fifth" Fair Use Factor: Are You Good or Bad?

Fair use involves subjective judgments and are often affected by factors such as a judge or jury 's personal sense of right or wrong.

For example, in one case a manufacturer of novelty cards parodied the successful children's dolls, the Cabbage Patch Kids. The parody card series was entitled the Garbage Pail Kids and used gruesome and grotesque names and characters to poke fun at the wholesome Cabbage Patch image. Some copyright experts were surprised when a federal court considered the parody an infringement, not a fair use. (Original Appalachian Artworks, Inc. v. Topps Chewing Gum, Inc., 642 F. Supp. 1031 (N.D. Ga. 1986).)

Before I address this one, let me summarize the arguments above:

1. the purpose and character of your use

Considering this is the primary argument, I believe this is slightly in favor of the 'sewer site'. Sewer sites which have significant added content to automatically score or provide comment on the quoted text (like Alexa or Google) would probably be more likely to get away with this.

2. the nature of the copyrighted work

Generally factual and published, so this would not harm in anyway that I can see.

3. the amount and substantiality of the portion taken, and

We are grabbing the keyworded text. A sophisticated judge should see that this can be considered the 'heart' of the text.

4. the effect of the use upon the potential market.

It does, on balance, undermine the original copyright owner though the argument can be obfuscated.

So this leaves us to 5, Are You Good or Bad?

I think this is fitting because the original posting was subtitled "Is this Evil?" and quite a few conversations around this have been had.

I also believe that if the argument is to be one that it is not fair use, it will be hinged on 5.

I also believe that this may be a significant part of the Google Mantra - "Do no evil" .. because they realise that they are massive fair use users. In order to profit off their potential copyright violations, Google is positioning themselves as providing a public service and trying to ward off losing on the fifth point.

So I think the reasonable conclusion is that if a Sewer Site would like to get away with what they are doing, they are best to follow in Googles footsteps and develop their site in such a way that it appears to be good and has significant added content and value so that they be saved by arguments 1) and 5).

blaze

10:51 am on Jun 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For those a little late to the party, I have used Macro's definition of a sewer site ..


Lately, I have noticed a large amount of clicks coming from directory types that are nothing but a selection of links centered around a particular keyword. These “directory” types sites pick a keyword and then import the top ten results for that keyword along with examples of how those keywords are used in each site.

Actually, it's loanuniverse, but Macro was the one to label them..

vkaryl

2:01 am on Jun 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay. I'll bite. Why are we going here?

blaze

2:14 am on Jun 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps you should read the previous threads.. A lot of people are complaining about sites without placing them in a legal context.

If a website is legally determined to be fair use than we should consider it fair games for the search engines, or at least for affiliate programs such as adsense and what not.

vkaryl

2:25 am on Jun 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



*shrug* You might consider using this as a dissertation headed toward a PHD.

I DID read the original threads. It's all just silliness....

[Edit: you ought to be at least marginally grateful - I'm the ONLY one who posted in response.... thereby "bumping" this....]

blaze

2:31 am on Jun 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure I'd want to characterize what these website owners are going through as silliness..

vkaryl

2:38 am on Jun 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Depends on your frame of ref I imagine.

BigDave

2:26 am on Jun 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you are placing a lot more value in different places than a judge would.

As for the "heart" of the content, it would almost certainly be as a normal human reading a page would interpret it.

I had a page that came up (many months ago) #1 in google for "metric converter" even though it was only mentioned once in the entire 100+k document. Even if they quote the entire paragraph that contains those two words, no one would ever suggest that those words were the heart of that work.

To claim that someone took the "heart" of your work, I would assume that it would have to be what would be considered the *literary heart* of the work. Does taking that snippet make it so that it removes the need to read the enitre work for most people?

It is an argument that you can still use if they just happened to take the real "heart" of your page. Of course you could always try it the way that you suggest, but I would not expect a judge to agree.

Your best bet is to still concentrate on #1 and #4 to try and get the fair use definition to be as strict as possible to limit the *quantity* that they can get away with quoting.

I would say that point #1 would actually work against most "sewer sites", especially if they serve up gramatically whole quoted pieces.

#1 is where you have to explain to the judge the reason that you copied the text. And there is a huge difference between "to provide comentary" and "to provide comentary, and make money selling ads to their competitors". And that will bring that judge around to #4 where they are making money by selling your content to advertisers.

gethan

1:11 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think this discussion could be extended to include the several types of 'sewer site' that, also when does a 'sewer site' graduate to being a site of value to all. The SE's have been dealing with this type of web spam since it's inception, but I do (and others) think that they are rising in prevalance at the current time, in the SERPs and probably in numbers.

Sewer Site Definitions feel free to extend/comment

1. Scraped Directorys: Automated directory produced by copying excerpts from top ranking sites, determined by popular search engines. Usually targets lucrative areas. Usually features very prominant advertising. Redeeming features - real links to the content sites. Though - not always present.

2. Autogenerated Content: At one point SEO software featured facilities to generate pages, this has probably gone from them - but autogenerated content is easy to make.

3. Obfuscated content stealers: Similar to the Scraped Directories - but don't offer any links or attempt to be a legitimate directory, content may or may not be obfuscated in some manner - but is originally stolen. Very difficult to find.

Probably a few more out there.

So what is the legality of each of the sites?

IMO

1) Fairuse -> Copyright Infringement = Very grey area. On a case by case basis based on the prominance of advertising, whether there is any commentary, added value from the directory.

2) Spam. Pure and simple - but not copyright infringement.

3) Spam + Copyright Infringement. But can you find it or prove it? I think not.

For the two cases of Spam the SE's have a responsiblity to ensure that they do not feature in the SERPs - for their own long term good. But due to SE's being algorithmic in nature there will always be ways to simulate what they rank highly - it will and has been a continual arms race between SE's and black hat webmasters.

On 1 again - I don't see it being in the Search Engines long term interests to rank pages/sites of this nature highly. Lets face it - 99.9% of traffic for this type of site will be directly from SE's -> cut them out of the result pages - and out go any profits.

Blaze quite successfully argued that in order not to be considered copyright infringement it boils down to 'good or evil' - and the resulting out come is that a directory in order not to infringe needs to be basically what we have seen all along in DMOZ, Yahoo and many other niche directories out there.

So side issues:

Is it worth our ROI to track down content infringers of this nature?

Do we just wait for SE's to sort out the problem?

Have programs like adsense made this type of site profitable? - eg. "targeted adverts for your millions of pages of stolen/generated spam!"

Bigdave - I read that infamous thread where you argued that fairuse is fairuse and for the most part people thought you were supporting site scraping. I know that wasn't the case... the difficulty is what is the essential difference between a SE and a Directory and a site scraper directory - IMO it is an incredibly thin curvy line that we can't see and is continually crossed by both types of sites. But on balance the site scrapers fall on the wrong side.

blaze

1:41 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BigDave actually undermined my argument that it might not be Fair Use. His statement was that quoted text is not necessarily 'the heart' because it it is utilizing high ranking keywords. It's an interesting and unexpected point.

Gethan, you summarize everything here quite accurately. One extra thing occurred to me last night was that this whole argument around AdSense/Affiliate is extremely misguided. What keeps someone from simply developing these sites with the idea that they will feed traffic to their *real* AdSense website. Ie, create a 'scraping directory' and then reserve the top two spots on all of them to go to their real AdSense content. Yes, they get some drop off but the danger of losing their AdSense contract becomes very minimal..

Your obfuscation of course is another issue altogether, though giving it some thought I still can't see how that could be done in a way that wouldn't generate gibberish on the web page. Alogorithmically, I'm sure detecting gibberish at some point won't be that difficult..

BigDave

5:52 pm on Jun 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



gethan:
Bigdave - I read that infamous thread where you argued that fairuse is fairuse and for the most part people thought you were supporting site scraping. I know that wasn't the case... the difficulty is what is the essential difference between a SE and a Directory and a site scraper directory - IMO it is an incredibly thin curvy line that we can't see and is continually crossed by both types of sites. But on balance the site scrapers fall on the wrong side.

and
Blaze quite successfully argued that in order not to be considered copyright infringement it boils down to 'good or evil' - and the resulting out come is that a directory in order not to infringe needs to be basically what we have seen all along in DMOZ, Yahoo and many other niche directories out there.

Thanks for actually paying attention to what I was saying in that thread

I just wanted to point out that the vast majority of real directories, such as yahoo and dmoz, do not even depend on Fair Use, as they do not grab any of your content. At most they will use your title and your meta description, but in most cases the description is written by the submittor or the directory editor, and titles are not covered under copyright law according to the US copyright FAQ.

A real human edited directory is a completely different beast than a site that does any sort of scraping.

blaze:

BigDave actually undermined my argument that it might not be Fair Use. His statement was that quoted text is not necessarily 'the heart' because it it is utilizing high ranking keywords. It's an interesting and unexpected point.

Most of your argument was actually quite sound, and you get kudos for at least trying to understand Fair Use. The problem is that you were trying to look for some sort of quick kill in the fair use area, and there really is no such thing when it comes to someone that is willing to fight you on a fair use claim.

If it is blatant infringement you will likely succeed early in a case, but if it is fought as what a district OR circuit court judge could possibly consider to be Fair Use or even in the grey area between Fair Use and Infringement, then getting them offline will be incredibly difficult and expensive.

So, before you proceed, you really need to try and rip apart your own arguments. You need to understand how three parties other than yourself might view them, the defendant, the judge and the jury. You need to read the statues, the case law, and understand the different procedures that are in place in the different circuits for how the fair use analysis is undertaken.

Your suggestion makes a lot of sense from a webmaster perspective. But I would be willing to bet that the judge is not a webmaster and even if (s)he is you will not get a majority of the circuit judges that look at the appeal that worry about the rankings in the search engines.

You really are looking in the right place, I just think that your goals will be better served by concentrating on the other factors to define the amount of copying that is allowed.

Actually, on second thought, there might be some merit in the "keywords as heart" argument, but only after you have done everything you can to go after all the other aspects in the analysis. It could be the final piece that pushes it in your favor. But I just don't see it getting very far as a primary argument without a lot of work being done in the other areas of the analysis to properly set it up.