homepage Welcome to WebmasterWorld Guest from 54.227.12.219
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Facebook Sues Data Scraper
Brett_Tabke




msg:4109880
 10:13 pm on Apr 4, 2010 (gmt 0)

Although not specifically a topic suited for this subforum, it seemed an important point-of-order, that FaceBook was essentially suing a guy for running a bot against FaceBook public profiles.

[fastcompany.com...]

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."

 

jmccormac




msg:4109891
 11:18 pm on Apr 4, 2010 (gmt 0)

At a guess, similar arguments to people running scrapers against EBay will be used in this case. If Facebook management had any brains they'd hire this guy.

Regards...jmcc

Alcoholico




msg:4109892
 11:19 pm on Apr 4, 2010 (gmt 0)

I am confused. Does this mean that if FB wins, based on that precedent I could sue G for scraping my site and win? Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

incrediBILL




msg:4109893
 11:29 pm on Apr 4, 2010 (gmt 0)

Brett, I think it's PERFECT for this forum ;)

"Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."


He's obviously never read this forum or listened to my tirade at Pubcon.

If Facebook wins, it makes any SE that scrapes without explicit opt-in via robots.txt a target.

Wouldn't surprise me if Google doesn't provide legal for the scraper just to make sure that doesn't happen.

StoutFiles




msg:4109895
 11:44 pm on Apr 4, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

Pretty much this.

tedster




msg:4109926
 1:36 am on Apr 5, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit.

[newscientist.com...]

Petrogold




msg:4109977
 3:23 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

graeme_p




msg:4110014
 6:41 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

Because Facebook can afford the legal fees of fighting this, but be cannot.

incrediBILL




msg:4110024
 7:06 am on Apr 5, 2010 (gmt 0)

It's a real shame because a solid ruling on this topic would possibly change the way people crawl the web.

Edge




msg:4110114
 1:38 pm on Apr 5, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?


It's form of "Oligarchy" usually results from inadequate or complete lack of industry regulations and controls.

Hugene




msg:4110344
 8:50 pm on Apr 5, 2010 (gmt 0)

Here is an interview with the guy behind the scraping.

[fastcompany.com...]

Isn't the content free and openly available? Shouldn't anyone be allowed to crawl it, as long as they don't reproduce it or sell it, I guess it is ok no?

JS_Harris




msg:4110433
 12:50 am on Apr 6, 2010 (gmt 0)

The content is public thus no crime was committed. Waste of resources.

Brett_Tabke




msg:4110438
 1:00 am on Apr 6, 2010 (gmt 0)

> not for this forum.

It is a story also about privacy and social media.

>robots

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court. It was never an accredited or adopted standard by any recognized standards body.

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

crobb305




msg:4110464
 2:51 am on Apr 6, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.


The title of the thread is a little misleading if the suit was never filed. I am glad I saw your clarification when I was skimming the posts :)

Very interesting story, and it would have had interesting consequences if a precedent had been set.

graeme_p




msg:4110609
 9:40 am on Apr 6, 2010 (gmt 0)

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court.


How many relevant cases have there been? Has a court actually ruled against "robots.txt said we could crawl?" as a defence?

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

So how are crawlers to know what sites to crawl? They can hardly parse the TOS!

dstiles




msg:4111100
 11:10 pm on Apr 6, 2010 (gmt 0)

Which is why it's time someone heavy forced a new standard of "robots.txt" that includes machine-readable versions of "what YOU cannot do".

I looked at the ACAP that someone around here suggested but I can't see it going anywhere without some big guys behind it. If W3 made it "legal" and there was then a legal case against something that ignored the TOS then perhaps google et al would adopt it (it would probably be google that ignore it...).

Until then, we're all stuffed. I can shout at scrapers and bots as much as I like but I can't afford to do anything about it - although I might try if there were a proper legal ruling.

shorebreak




msg:4111444
 2:46 pm on Apr 7, 2010 (gmt 0)

I'm a non-technical sales knucklehead. Please explain to me why y'all care about this topic. Is it because bots force you to incur costs you otherwise wouldn't have to bear?

tenerifejim




msg:4113167
 7:55 pm on Apr 9, 2010 (gmt 0)

FB's terms of service talk about robotic software being unacceptable way to connect


But as far as I can tell it is not an unacceptable way to connect to the Google cache - which has all the same data. Although I may be wrong. In this TOS is there exception made for Google?

Also, isn't their entire API designed for this type of connection?

Make someone want to try the same thing and see it out in court...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved