Welcome to WebmasterWorld Guest from 54.205.170.21

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Facebook Sues Data Scraper

   
10:13 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



Although not specifically a topic suited for this subforum, it seemed an important point-of-order, that FaceBook was essentially suing a guy for running a bot against FaceBook public profiles.

[fastcompany.com...]

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."
11:18 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At a guess, similar arguments to people running scrapers against EBay will be used in this case. If Facebook management had any brains they'd hire this guy.

Regards...jmcc
11:19 pm on Apr 4, 2010 (gmt 0)

5+ Year Member



I am confused. Does this mean that if FB wins, based on that precedent I could sue G for scraping my site and win? Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?
11:29 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Brett, I think it's PERFECT for this forum ;)

"Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."


He's obviously never read this forum or listened to my tirade at Pubcon.

If Facebook wins, it makes any SE that scrapes without explicit opt-in via robots.txt a target.

Wouldn't surprise me if Google doesn't provide legal for the scraper just to make sure that doesn't happen.
11:44 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

Pretty much this.
1:36 am on Apr 5, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit.

[newscientist.com...]
3:23 am on Apr 5, 2010 (gmt 0)

5+ Year Member



But why Warden had to do this?
6:41 am on Apr 5, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



But why Warden had to do this?

Because Facebook can afford the legal fees of fighting this, but be cannot.
7:06 am on Apr 5, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It's a real shame because a solid ruling on this topic would possibly change the way people crawl the web.
1:38 pm on Apr 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?


It's form of "Oligarchy" usually results from inadequate or complete lack of industry regulations and controls.
8:50 pm on Apr 5, 2010 (gmt 0)

10+ Year Member



Here is an interview with the guy behind the scraping.

[fastcompany.com...]

Isn't the content free and openly available? Shouldn't anyone be allowed to crawl it, as long as they don't reproduce it or sell it, I guess it is ok no?
12:50 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



The content is public thus no crime was committed. Waste of resources.
1:00 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



> not for this forum.

It is a story also about privacy and social media.

>robots

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court. It was never an accredited or adopted standard by any recognized standards body.

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.
2:51 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.


The title of the thread is a little misleading if the suit was never filed. I am glad I saw your clarification when I was skimming the posts :)

Very interesting story, and it would have had interesting consequences if a precedent had been set.
9:40 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court.


How many relevant cases have there been? Has a court actually ruled against "robots.txt said we could crawl?" as a defence?

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

So how are crawlers to know what sites to crawl? They can hardly parse the TOS!
11:10 pm on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Which is why it's time someone heavy forced a new standard of "robots.txt" that includes machine-readable versions of "what YOU cannot do".

I looked at the ACAP that someone around here suggested but I can't see it going anywhere without some big guys behind it. If W3 made it "legal" and there was then a legal case against something that ignored the TOS then perhaps google et al would adopt it (it would probably be google that ignore it...).

Until then, we're all stuffed. I can shout at scrapers and bots as much as I like but I can't afford to do anything about it - although I might try if there were a proper legal ruling.
2:46 pm on Apr 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm a non-technical sales knucklehead. Please explain to me why y'all care about this topic. Is it because bots force you to incur costs you otherwise wouldn't have to bear?
7:55 pm on Apr 9, 2010 (gmt 0)

10+ Year Member



FB's terms of service talk about robotic software being unacceptable way to connect


But as far as I can tell it is not an unacceptable way to connect to the Google cache - which has all the same data. Although I may be wrong. In this TOS is there exception made for Google?

Also, isn't their entire API designed for this type of connection?

Make someone want to try the same thing and see it out in court...