Facebook Sues Data Scraper

Forum Moderators: open

Message Too Old, No Replies

Facebook Sues Data Scraper

Brett_Tabke

10:13 pm on Apr 4, 2010 (gmt 0)

Although not specifically a topic suited for this subforum, it seemed an important point-of-order, that FaceBook was essentially suing a guy for running a bot against FaceBook public profiles.

[fastcompany.com...]

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."

jmccormac

11:18 pm on Apr 4, 2010 (gmt 0)

At a guess, similar arguments to people running scrapers against EBay will be used in this case. If Facebook management had any brains they'd hire this guy.

Regards...jmcc

Alcoholico

11:19 pm on Apr 4, 2010 (gmt 0)

I am confused. Does this mean that if FB wins, based on that precedent I could sue G for scraping my site and win? Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

incrediBILL

11:29 pm on Apr 4, 2010 (gmt 0)

Brett, I think it's PERFECT for this forum ;)

"Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."

He's obviously never read this forum or listened to my tirade at Pubcon.

If Facebook wins, it makes any SE that scrapes without explicit opt-in via robots.txt a target.

Wouldn't surprise me if Google doesn't provide legal for the scraper just to make sure that doesn't happen.

StoutFiles

11:44 pm on Apr 4, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

Pretty much this.

tedster

1:36 am on Apr 5, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit.

[newscientist.com...]

Petrogold

3:23 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

graeme_p

6:41 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

Because Facebook can afford the legal fees of fighting this, but be cannot.

incrediBILL

7:06 am on Apr 5, 2010 (gmt 0)

It's a real shame because a solid ruling on this topic would possibly change the way people crawl the web.

Edge

1:38 pm on Apr 5, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

It's form of "Oligarchy" usually results from inadequate or complete lack of industry regulations and controls.

Hugene

8:50 pm on Apr 5, 2010 (gmt 0)

Here is an interview with the guy behind the scraping.

[fastcompany.com...]

Isn't the content free and openly available? Shouldn't anyone be allowed to crawl it, as long as they don't reproduce it or sell it, I guess it is ok no?

JS_Harris

12:50 am on Apr 6, 2010 (gmt 0)

The content is public thus no crime was committed. Waste of resources.

Brett_Tabke

1:00 am on Apr 6, 2010 (gmt 0)

> not for this forum.

It is a story also about privacy and social media.

>robots

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court. It was never an accredited or adopted standard by any recognized standards body.

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

crobb305

2:51 am on Apr 6, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

The title of the thread is a little misleading if the suit was never filed. I am glad I saw your clarification when I was skimming the posts :)

Very interesting story, and it would have had interesting consequences if a precedent had been set.

graeme_p

9:40 am on Apr 6, 2010 (gmt 0)

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court.

How many relevant cases have there been? Has a court actually ruled against "robots.txt said we could crawl?" as a defence?

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

So how are crawlers to know what sites to crawl? They can hardly parse the TOS!

dstiles

11:10 pm on Apr 6, 2010 (gmt 0)

Which is why it's time someone heavy forced a new standard of "robots.txt" that includes machine-readable versions of "what YOU cannot do".

I looked at the ACAP that someone around here suggested but I can't see it going anywhere without some big guys behind it. If W3 made it "legal" and there was then a legal case against something that ignored the TOS then perhaps google et al would adopt it (it would probably be google that ignore it...).

Until then, we're all stuffed. I can shout at scrapers and bots as much as I like but I can't afford to do anything about it - although I might try if there were a proper legal ruling.

shorebreak

2:46 pm on Apr 7, 2010 (gmt 0)

I'm a non-technical sales knucklehead. Please explain to me why y'all care about this topic. Is it because bots force you to incur costs you otherwise wouldn't have to bear?

tenerifejim

7:55 pm on Apr 9, 2010 (gmt 0)

FB's terms of service talk about robotic software being unacceptable way to connect

But as far as I can tell it is not an unacceptable way to connect to the Google cache - which has all the same data. Although I may be wrong. In this TOS is there exception made for Google?

Also, isn't their entire API designed for this type of connection?

Make someone want to try the same thing and see it out in court...