homepage Welcome to WebmasterWorld Guest from 54.205.242.179
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / WebmasterWorld / Professional Webmaster Business Issues
Forum Library, Charter, Moderators: LifeinAsia & httpwebwitch

Professional Webmaster Business Issues Forum

This 40 message thread spans 2 pages: 40 ( [1] 2 > >     
Internet Archive Named in Suit Over Archived Pages
Wayback Machine Thrust Into Modern Spotlight
digitalghost




msg:784606
 7:44 pm on Jul 13, 2005 (gmt 0)

The N.Y. Times is reporting that Internet Archives is being sued.

The Internet Archive was created in 1996 as the institutional memory of the online world, storing snapshots of ever-changing Web sites and collecting other multimedia artifacts. Now the nonprofit archive is on the defensive in a legal case that represents a strange turn in the debate over copyrights in the digital age.

Original Article [nytimes.com]

Alternate Link [news.com.com]

This case could have significant impact on caching and archiving protocol. I'm sure Google will be watching the case closely as they operate their own extensive caching system.

Ostensibly the case seems to hinge on robots.txt, but the real gist of this may be a decision on whether extensive caching is legal or not. Fair Use will be examined quite closely.

Last week Healthcare Advocates sued both the Harding Earley firm and the Internet Archive, saying the access to its old Web pages, stored in the Internet Archive's database, was unauthorized and illegal.

[edited by: Brett_Tabke at 8:30 pm (utc) on July 13, 2005]
[edit reason] fixed link - added quote [/edit]

 

Webwork




msg:784607
 9:14 pm on Jul 13, 2005 (gmt 0)

At times like this it pays to read the law, which I now lawfully post:

[copyright.gov...]

§ 108. Limitations on exclusive rights: Reproduction by libraries and archives

(a) . . . it is not an infringement of copyright for a library or archives, or any of its employees acting within the scope of their employment, to reproduce no more than one copy or phonorecord of a work, except as provided in subsections (b) and (c), or to distribute such copy or phonorecord, under the conditions specified by this section, if —

(1) the reproduction or distribution is made without any purpose of direct or indirect commercial advantage;

(2) the collections of the library or archives are (i) open to the public, or (ii) available not only to researchers affiliated with the library or archives or with the institution of which it is a part, but also to other persons doing research in a specialized field; and

(3) the reproduction or distribution of the work includes a notice of copyright that appears on the copy or phonorecord that is reproduced under the provisions of this section, or includes a legend stating that the work may be protected by copyright if no such notice can be found on the copy or phonorecord that is reproduced under the provisions of this section.

Any questions?

I'd say that on the face of it somebody has an uphill battle, and I don't think that it's the party that's been operating a public archive of the WWW for the past 5 years.

And this for your general reading pleasure is "the other" section on fair use, the one that doesn't specifically discuss archives:

§ 107. Limitations on exclusive rights: Fair use

. . . the fair use of a copyrighted work, including such use by reproduction in copies . . . or by any other means specified by that section, for purposes such as . . .scholarship, or research, is not an infringement of copyright.

In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include —

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.

richlowe




msg:784608
 9:48 pm on Jul 13, 2005 (gmt 0)

My personal rule: if you post it on the internet, the world knows. If you don't want it known, don't post it.

digitalghost




msg:784609
 10:14 pm on Jul 13, 2005 (gmt 0)

>>Any questions?

Of course. ;) But suffice it to say that this case will be about interpretation of the laws. Basically stating that "the law is clear on this matter" doesn't quite cut it. Obviously, there are some folks out there that want to test the law.

This particular test of the law has been long in coming, although I don't think this may be the best case to use as a test. The issue with robots.txt is unlear. Seems there's a lot of what-ifs that I don't what to speculate on as I'd much rather be able to deal with the facts.

>>If you don't want it known, don't post it.

Personal rules are fine and well, but have no bearing on this case.

What I do see is an opportunity for legislation that will clear up some murky issues regarding copyright law, archiving, caching, robots.text, etc. Whether potential legislation would be good or bad, I simply don't know, the issues aren't at that stage yet.

I do believe that the opportunity for clarity on these issues may be the best thing to arise from this case. Let's hope the opportunity is used well.

Brett_Tabke




msg:784610
 10:40 pm on Jul 13, 2005 (gmt 0)

> Any questions?

Ya, so i can "archive" all the mp3's in the world?

Webwork




msg:784611
 11:07 pm on Jul 13, 2005 (gmt 0)

I'm not saying that what follows is an answer to anything, but it makes an interesting read in light of the reference to archiving MP3s.

There is another section of the Copyright Act that I believe speaks to this concern. It's also contained in § 108. Limitations on exclusive rights: Reproduction by libraries and archives. Here's the language in pertinent part. I've bolded what looks to be the most relevant part relating to MP3s:

(b) The rights of reproduction and distribution under this section apply to three copies or phonorecords of an unpublished work duplicated solely for purposes of preservation and security or for deposit for research use in another library or archives of the type described by clause (2) of subsection (a), if —

(1) the copy or phonorecord reproduced is currently in the collections of the library or archives; and

(2) any such copy or phonorecord that is reproduced in digital format is not otherwise distributed in that format and is not made available to the public in that format outside the premises of the library or archives.

(c) The right of reproduction under this section applies to three copies or phonorecords of a published work duplicated solely for the purpose of replacement of a copy or phonorecord that is damaged, deteriorating, lost, or stolen, or if the existing format in which the work is stored has become obsolete, if —

(1) the library or archives has, after a reasonable effort, determined that an unused replacement cannot be obtained at a fair price; and

(2) any such copy or phonorecord that is reproduced in digital format is not made available to the public in that format outside the premises of the library or archives in lawful possession of such copy.

For purposes of this subsection, a format shall be considered obsolete if the machine or device necessary to render perceptible a work stored in that format is no longer manufactured or is no longer reasonably available in the commercial marketplace.

(d) The rights of reproduction and distribution under this section apply to a copy, made from the collection of a library or archives where the user makes his or her request or from that of another library or archives, of no more than one article or other contribution to a copyrighted collection or periodical issue, or to a copy or phonorecord of a small part of any other copyrighted work, if —

(1) the copy or phonorecord becomes the property of the user, and the library or archives has had no notice that the copy or phonorecord would be used for any purpose other than private study, scholarship, or research; and

(2) the library or archives displays prominently, at the place where orders are accepted, and includes on its order form, a warning of copyright in accordance with requirements that the Register of Copyrights shall prescribe by regulation.

(e) The rights of reproduction and distribution under this section apply to the entire work, or to a substantial part of it, made from the collection of a library or archives where the user makes his or her request or from that of another library or archives, if the library or archives has first determined, on the basis of a reasonable investigation, that a copy or phonorecord of the copyrighted work cannot be obtained at a fair price, if —

(1) the copy or phonorecord becomes the property of the user, and the library or archives has had no notice that the copy or phonorecord would be used for any purpose other than private study, scholarship, or research; and

(2) the library or archives displays prominently, at the place where orders are accepted, and includes on its order form, a warning of copyright in accordance with requirements that the Register of Copyrights shall prescribe by regulation.

(f) Nothing in this section —

(1) shall be construed to impose liability for copyright infringement upon a library or archives or its employees for the unsupervised use of reproducing equipment located on its premises: Provided, That such equipment displays a notice that the making of a copy may be subject to the copyright law;

(2) excuses a person who uses such reproducing equipment or who requests a copy or phonorecord under subsection (d) from liability for copyright infringement for any such act, or for any later use of such copy or phonorecord, if it exceeds fair use as provided by section 107;

(3) shall be construed to limit the reproduction and distribution by lending of a limited number of copies and excerpts by a library or archives of an audiovisual news program, subject to clauses (1), (2), and (3) of subsection (a); or

(4) in any way affects the right of fair use as provided by section 107, or any contractual obligations assumed at any time by the library or archives when it obtained a copy or phonorecord of a work in its collections.

(g) The rights of reproduction and distribution under this section extend to the isolated and unrelated reproduction or distribution of a single copy or phonorecord of the same material on separate occasions, but do not extend to cases where the library or archives, or its employee —

(1) is aware or has substantial reason to believe that it is engaging in the related or concerted reproduction or distribution of multiple copies or phonorecords of the same material, whether made on one occasion or over a period of time, and whether intended for aggregate use by one or more individuals or for separate use by the individual members of a group; or

(2) engages in the systematic reproduction or distribution of single or multiple copies or phonorecords of material described in subsection (d): Provided, That nothing in this clause prevents a library or archives from participating in interlibrary arrangements that do not have, as their purpose or effect, that the library or archives receiving such copies or phonorecords for distribution does so in such aggregate quantities as to substitute for a subscription to or purchase of such work.

(h)(1) For purposes of this section, during the last 20 years of any term of copyright of a published work, a library or archives, including a nonprofit educational institution that functions as such, may reproduce, distribute, display, or perform in facsimile or digital form a copy or phonorecord of such work, or portions thereof, for purposes of preservation, scholarship, or research, if such library or archives has first determined, on the basis of a reasonable investigation, that none of the conditions set forth in subparagraphs (A), (B), and (C) of paragraph (2) apply.

(2) No reproduction, distribution, display, or performance is authorized under this subsection if —

(A) the work is subject to normal commercial exploitation;

(B) a copy or phonorecord of the work can be obtained at a reasonable price; or

(C) the copyright owner or its agent provides notice pursuant to regulations promulgated by the Register of Copyrights that either of the conditions set forth in subparagraphs (A) and (B) applies.

(3) The exemption provided in this subsection does not apply to any subsequent uses by users other than such library or archives.

(i) The rights of reproduction and distribution under this section do not apply to a musical work, a pictorial, graphic or sculptural work, or a motion picture or other audiovisual work other than an audiovisual work dealing with news, except that no such limitation shall apply with respect to rights granted by subsections (b) and (c), or with respect to pictorial or graphic works published as illustrations, diagrams, or similar adjuncts to works of which copies are reproduced or distributed in accordance with subsections (d) and (e).

Leosghost




msg:784612
 11:23 pm on Jul 13, 2005 (gmt 0)

> Any questions?

Ya, so i can "archive" all the mp3's in the world?

Only if me and a guy who goes by the nick of "LightningUk" can do the same with all the vob and AC3 files too...;)

On another note ..inspite of my love for the USA ..I always get jumpy at the idea that one can apply legislation from one country to something one finds or does in another ....Who's law is relevant in whose juristictions in this and many other matters?..such as software or idea patents ,personal safeguard copies of softs or DVD's etc photocopying ..or copying by any means ( my ...amonsgt others ) work ....

And or my "adopted govt" telling am*z*n and eb*y what it can and cant sell ...

Can 'o' worms ..( or escargots )!...The absolute worst thing will be to let the WTO sort out such ..the next worst scenario ( apologies to webwork ) is to let the lawyers and legislators play in the mud ...they will be the only winners ...

PS ..Webwork ..I get here less frequently ..didn't know you were now a mod ( overdue IMHO )..congrats ..much respect.. in spite of my ribbing ...;).you want a copy of the "code Napoleonic" in the original?..;).....

Chris_R




msg:784613
 11:48 pm on Jul 13, 2005 (gmt 0)

Nice WebWork.

I don't think this is going to fly based on that. I think we are going to have to wait to see someone make a case against Google. I find the DMCA complaint to be silly as well.

ThomasB




msg:784614
 12:05 am on Jul 14, 2005 (gmt 0)

Great post Webwork. I think time will tell in this case, but hope to see an answer soon, especially as I have similar projects out there.

No scraper sites. Seriously.

walkman




msg:784615
 12:15 am on Jul 14, 2005 (gmt 0)

>> At times like this it pays to read the law

thanks. I'm sure no one read it before they filed the lawsuit. ;)

if the law was that clear, we wouldn't need judges and juries, computers could handle it.

As far as archives.org: At least you should be able to permanently remove YOUR site /pages when you ask. I don't think they remove anything, other than allow you to block it via robots.

Webwork




msg:784616
 12:35 am on Jul 14, 2005 (gmt 0)

What's most telling to me is the absence of a definitive ruling prohibiting or even temporarilty enjoining the likes of Archive.org or Google's cache, despite the fact that both have existed for 5+ years and are so far reaching in their coverage.

If it really was a bona fide copyright issue would it first arise in a case like this?

Than again, there's an old saying: "Bad cases make bad law."

Maybe it's time for an Amicus Brief, Google?

walkman




msg:784617
 12:44 am on Jul 14, 2005 (gmt 0)

all it takes one to file a lawsuit, you know. If you slap me and I don't say anything, doesn't it's legal to slap people. Next time you slap someone else and he takes a different route from me.

hunderdown




msg:784618
 3:08 am on Jul 14, 2005 (gmt 0)

At least you should be able to permanently remove YOUR site /pages when you ask. I don't think they remove anything, other than allow you to block it via robots.

Not sure I agree. After all, it contains materials that had been freely and publicly available at some time in the past. If a company produces a brochure which a library collects, does that company have a right to ask for it to be discarded just because they say so? I don't think so.

It was interesting to read that law firms use the Internet Archive as a resource in trademark cases--sort of a cheap and easy discovery process alternative.

Personally, I hope this case gets thrown out. I am definitely rooting for the defense here.

walkman




msg:784619
 3:32 am on Jul 14, 2005 (gmt 0)

>> After all, it contains materials that had been freely and publicly available at some time in the past

Here's one scenario: go to NY Times site right now and ALL articles are free. Try to read the same article two weeks later for free.

By posting it online, one doesn't give up the copyright.

[edited by: walkman at 3:40 am (utc) on July 14, 2005]

digitalghost




msg:784620
 3:38 am on Jul 14, 2005 (gmt 0)

Here's another scenario regarding that freely available public content.

I'm interested in SEO and internet marketing, so I decide to gather articles from all the SEOs, marketers and copywriters out there. Just want to archive them, on MY domain. They'll be freely available to the public and all those researchers. Any of you SEOs, marketers or copywriters have any problems with that?

.
.

.
.

.
.
thought you might...

Manga




msg:784621
 5:34 am on Jul 14, 2005 (gmt 0)

At least you should be able to permanently remove YOUR site /pages when you ask. I don't think they remove anything, other than allow you to block it via robots.

They do remove them. I recently had all my domains removed from the archive. You can have your site removed using robots.txt with this entry:

User-agent: ia_archiver
Disallow: /

This will remove your site from the archive. Also, if you can't use the robots.txt method you can contact them and they will remove your domains manually. I was not able to use the robots.txt feature for some of my domains for various reasons and contacted them and they removed the domains for me.

For more info on how to remove your site check out this page [archive.org...]

walkman




msg:784622
 5:53 am on Jul 14, 2005 (gmt 0)

dupe

[edited by: walkman at 5:54 am (utc) on July 14, 2005]

walkman




msg:784623
 5:54 am on Jul 14, 2005 (gmt 0)

>> I recently had all my domains removed from the archive

I tried it a while back. Removed the robots.txt and noticed that I really didn't remove anything. They still have them, just don't let people see if you have the robots.txt.

httpwebwitch




msg:784624
 6:10 am on Jul 14, 2005 (gmt 0)


if you post it on the internet, the world knows. If you don't want it known, don't post it.

That's a sensible position when it comes to privacy issues, but caching vs. copyright is a different thing.

If I put timely information in a temporary medium (my website) "This week - all widgets 50% off", then I should be able to remove that message next week. Someone reading a cached version of my ad next year might call it false advertising if I don't give them the sale price.

The key point there is that the Internet is supposed to be a dynamic medium. content should be fresh, and pages should be reloaded from their source every so often. When I read something on the www, I assume it's fresh info. Caching spoils that assumption.

On the other hand, I am really glad the Archive exists - for many sites that have been taken offline, it's the only extant evidence of many years of my career.

Chris_R




msg:784625
 8:21 am on Jul 14, 2005 (gmt 0)

>On the other hand, I am really glad the Archive exists - for many sites that have been taken offline, it's the only extant evidence of many years of my career.

I think so too. They are good people over there and really are doing this for the common good.

They don't need the money. Usually that doesn't matter in copyright cases, but as WebWork pointed out - it does matter in this one.

Keep in mind that TECHNICALLY - if this suit was to succeed - AOL & others would have major problems. They have caches of tons of stuff out there in order to speed up internet access. Technically that would violating a copyright.

This would be a bad, bad, bad, bad thing if it were to occur.

RonPK




msg:784626
 10:20 am on Jul 14, 2005 (gmt 0)

> Technically that would be violating a copyright.

Maybe so, but nobody suffers any loss of income or other damage by ISPs caching pages. Surely US law would take that into regard?

I'm quite happy with the existence of the archive. From a scholarly point of view it is invaluable, as information tends to be very volatile in the digital millennium. It is important to somehow store all that information. IMHO opt-out is fine is this case.

It would be even better if they'd display my AdSense ads...

Receptional




msg:784627
 1:08 pm on Jul 14, 2005 (gmt 0)

>>Any questions
Yes...
it is not an infringement of copyright for a library or archives, or any of its employees acting within the scope of their employment, to reproduce no more than one copy...

I would say there is a technically valid case to say that putting up a web page constitutes more than one copy.

the reproduction or distribution is made without any purpose of direct or indirect commercial advantage

I would say that unless the defendent has no adsense, is not paying its staff, is a charity, and is not even financing its (not inconsiderable) hosting costs, then this is unlikely to be entirely the case.

incrediBILL




msg:784628
 2:41 pm on Jul 14, 2005 (gmt 0)

Did you people actually read the article about the suit?

ROFLMAO

Basing a lawsuit on robots.txt which is voluntary at best, it's not even a mandatory protocol although people wish it were.

"We're suing you because you didn't voluntarily support this!"

Oh my.

Are these people smoking crack or what?

Had they actually just blocked ia_archive from accessing the server, or it's IPs, or asked the internet archive to REMOVE their listings (there is documented methods for doing this on the archives web site) then this wouldn't be an issue.

If you read closely, they claim the lawyer was accessing pages on THEIR server via the archive, they found entries in the log file, which would imply the pages were dynamic and not even cached unless I'm misreading this.

What about search engines that maintain old cache of pages for YEARS that are long gone, I know Google's still clinging to a couple of mine that don't even exist anymore.

It's a HUGE slippery slope, where does this all end?

So in summary this case about the plaintiff wanting the court to find the defendant at fault because the plaintiff was technically inept in proper server administration and failed in their feeble attempt to restrict the defendant from their servers.

Had they only read WebmasterWorld they would've known how to do it right the first time!

Manga




msg:784629
 3:46 pm on Jul 14, 2005 (gmt 0)

Apparently they did follow the Archive's instructions on removing their sites using robots.txt. The Archive specifically states that if you want your site to be removed to do so with robots.txt. This statement is found on this page [archive.org...]

The plaintiff followed the Archive's instructions and yet a cache of their site was still accessed. This is what they are angry about and the basis of their suit. I believe they have a valid argument against the archive. Some time ago I made sure all my domains were removed from the Archive. I would be rather angry if they started showing up again after I specifically asked for them to be removed.

The Archive may be in their rights to display caches of websites, but they also need to insure that those caches are permanently removed if the domain owner requests it. This is no different than any other "opt-out" argument.

jdkuehne




msg:784630
 4:51 pm on Jul 14, 2005 (gmt 0)

But on at least two dates in July 2003, the suit states, Web logs at Healthcare Advocates indicated that someone at Harding Earley, using the Wayback Machine, made hundreds of rapid-fire requests for the old versions of the Web site. In most cases, the robot.txt blocked the request. But in 92 instances, the suit states, it appears to have failed, allowing access to the archived pages.

Perhaps I am missing something, but doesn't the IA store thw archived pages on its own servers? How is it possible that Healthcare Advocates could see these requests in their own logs?

digitalghost




msg:784631
 4:54 pm on Jul 14, 2005 (gmt 0)

JD, HealthCare Advocates was complaining about the IA bot hitting their site, and ignoring robots.txt and gathering those pages for inclusion in the archive.

jim_w




msg:784632
 5:02 pm on Jul 14, 2005 (gmt 0)

I agree with Receptional. One copy on a server is not the same as it being displayed 1000 times a day. I would say that is 1000 copies. Especially since it is cached with IE and other browsers. So the one copy on the server is now 1000 copies on several computers.

It’s not the one copy on the server, it the distribution of that copy that comes into play.

hunderdown




msg:784633
 5:55 pm on Jul 14, 2005 (gmt 0)

If I put timely information in a temporary medium (my website) "This week - all widgets 50% off", then I should be able to remove that message next week. Someone reading a cached version of my ad next year might call it false advertising if I don't give them the sale price.

Well, to get back to my earlier point, does that also mean that advertisers should be able to remove outdated brochures, sales circulars, or what-have-you, from libraries? Those are outdated too.

Provided the archive displays the date when the copy was made, how would anyone reasonably argue that the sale was still valid? I think you're stretching too far here.

Re the NY Times example from another poster--articles that are only temporarily available for free--I'll grant that materials that the owner is not freely displaying should be handled differently, as I believe the Internet Archive does.

But other material, free, public, linked to: I don't see the problem with that being in a non-commercial archive for public use.

What it comes down to is that there is a sizable amount of public interest in allowing an archive like this to exist. The plaintiffs are suing because, basically, they want it to more effectively hide information that might be damaging to them. While it might be possible technically to allow for greater control over content by its owners, I think that would become a real mess to administer for Internet Archive. And I hope the court recognizes that.

Huckleberry




msg:784634
 5:57 pm on Jul 14, 2005 (gmt 0)

digitalghost.. I guess I dont understand either.. if they are complaining that the archive site was accessing the robots.txt then how did they know that it was someone from the lawfirm as indicated in the article..

Does that information show in the server logs... I didnt think so, but I just tried it myself and will have to check the logs later...

Also, isnt the site located on the archive site server therefore if they were accessing an older version would it even show in their logs at all?

Maybe I dont understand how it works, but I am a little confused.

Huckleberry




msg:784635
 6:03 pm on Jul 14, 2005 (gmt 0)

Dont know how to edit the post... but I just tested my theory and it does show up.. but my one query resulted in 30 different GET robots.txt attempts.. which I dont currently have installed...

It retrieved only my images on the server and nothing else so that means that the html is served from archives.org and the images still reside on my server... so maybe they had the file configured incorrectly... I dont think it could be proved that they circumvented any technology...

This 40 message thread spans 2 pages: 40 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Professional Webmaster Business Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved