Welcome to WebmasterWorld Guest from 54.196.208.6

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Facebook Sues Data Scraper

     
10:13 pm on Apr 4, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38047
votes: 11


Although not specifically a topic suited for this subforum, it seemed an important point-of-order, that FaceBook was essentially suing a guy for running a bot against FaceBook public profiles.

[fastcompany.com...]

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."
11:18 pm on Apr 4, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2415
votes: 24


At a guess, similar arguments to people running scrapers against EBay will be used in this case. If Facebook management had any brains they'd hire this guy.

Regards...jmcc
11:19 pm on Apr 4, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 14, 2006
posts:172
votes: 0


I am confused. Does this mean that if FB wins, based on that precedent I could sue G for scraping my site and win? Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?
11:29 pm on Apr 4, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14622
votes: 88


Brett, I think it's PERFECT for this forum ;)

"Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."


He's obviously never read this forum or listened to my tirade at Pubcon.

If Facebook wins, it makes any SE that scrapes without explicit opt-in via robots.txt a target.

Wouldn't surprise me if Google doesn't provide legal for the scraper just to make sure that doesn't happen.
11:44 pm on Apr 4, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 6, 2008
posts:2011
votes: 0


Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

Pretty much this.
1:36 am on Apr 5, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit.

[newscientist.com...]
3:23 am on Apr 5, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 18, 2006
posts:121
votes: 0


But why Warden had to do this?
6:41 am on Apr 5, 2010 (gmt 0)

Senior Member from LK 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2417
votes: 17


But why Warden had to do this?

Because Facebook can afford the legal fees of fighting this, but be cannot.
7:06 am on Apr 5, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14622
votes: 88


It's a real shame because a solid ruling on this topic would possibly change the way people crawl the web.
1:38 pm on Apr 5, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 29, 2001
posts:1081
votes: 16


Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?


It's form of "Oligarchy" usually results from inadequate or complete lack of industry regulations and controls.
8:50 pm on Apr 5, 2010 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 11, 2004
posts:582
votes: 0


Here is an interview with the guy behind the scraping.

[fastcompany.com...]

Isn't the content free and openly available? Shouldn't anyone be allowed to crawl it, as long as they don't reproduce it or sell it, I guess it is ok no?
12:50 am on Apr 6, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:July 29, 2007
posts:1524
votes: 9


The content is public thus no crime was committed. Waste of resources.
1:00 am on Apr 6, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38047
votes: 11


> not for this forum.

It is a story also about privacy and social media.

>robots

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court. It was never an accredited or adopted standard by any recognized standards body.

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.
2:51 am on Apr 6, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 3, 2002
posts:2575
votes: 0


Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.


The title of the thread is a little misleading if the suit was never filed. I am glad I saw your clarification when I was skimming the posts :)

Very interesting story, and it would have had interesting consequences if a precedent had been set.
9:40 am on Apr 6, 2010 (gmt 0)

Senior Member from LK 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2417
votes: 17


Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court.


How many relevant cases have there been? Has a court actually ruled against "robots.txt said we could crawl?" as a defence?

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

So how are crawlers to know what sites to crawl? They can hardly parse the TOS!
11:10 pm on Apr 6, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3091
votes: 2


Which is why it's time someone heavy forced a new standard of "robots.txt" that includes machine-readable versions of "what YOU cannot do".

I looked at the ACAP that someone around here suggested but I can't see it going anywhere without some big guys behind it. If W3 made it "legal" and there was then a legal case against something that ignored the TOS then perhaps google et al would adopt it (it would probably be google that ignore it...).

Until then, we're all stuffed. I can shout at scrapers and bots as much as I like but I can't afford to do anything about it - although I might try if there were a proper legal ruling.
2:46 pm on Apr 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 30, 2003
posts:998
votes: 0


I'm a non-technical sales knucklehead. Please explain to me why y'all care about this topic. Is it because bots force you to incur costs you otherwise wouldn't have to bear?
7:55 pm on Apr 9, 2010 (gmt 0)

Full Member

10+ Year Member

joined:Jan 3, 2004
posts:333
votes: 0


FB's terms of service talk about robotic software being unacceptable way to connect


But as far as I can tell it is not an unacceptable way to connect to the Google cache - which has all the same data. Although I may be wrong. In this TOS is there exception made for Google?

Also, isn't their entire API designed for this type of connection?

Make someone want to try the same thing and see it out in court...