homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Facebook Sues Data Scraper
Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4109881 posted 10:13 pm on Apr 4, 2010 (gmt 0)

Although not specifically a topic suited for this subforum, it seemed an important point-of-order, that FaceBook was essentially suing a guy for running a bot against FaceBook public profiles.

[fastcompany.com...]

Warden gathered that data from public profiles using "crawling" software similar to what's commonly available on the Web; he was planning to release the set to select researchers, who proposed cross-referencing that data in all sorts of cool ways, trying to find links, for example, between income, employment, and social connections. (Does having more friends equal more cash? Is there a threshold, where too many friends means you're way to social?) As Warden was at pains to point out, the data is exceedingly public: You can still access it through Google's caches; and as Warden writes, "Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."

 

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4109881 posted 11:18 pm on Apr 4, 2010 (gmt 0)

At a guess, similar arguments to people running scrapers against EBay will be used in this case. If Facebook management had any brains they'd hire this guy.

Regards...jmcc

Alcoholico

5+ Year Member



 
Msg#: 4109881 posted 11:19 pm on Apr 4, 2010 (gmt 0)

I am confused. Does this mean that if FB wins, based on that precedent I could sue G for scraping my site and win? Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4109881 posted 11:29 pm on Apr 4, 2010 (gmt 0)

Brett, I think it's PERFECT for this forum ;)

"Nobody ever alleged that my data gathering was outside the rules the Web has operated by since crawlers existed."


He's obviously never read this forum or listened to my tirade at Pubcon.

If Facebook wins, it makes any SE that scrapes without explicit opt-in via robots.txt a target.

Wouldn't surprise me if Google doesn't provide legal for the scraper just to make sure that doesn't happen.

StoutFiles

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4109881 posted 11:44 pm on Apr 4, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?

Pretty much this.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4109881 posted 1:36 am on Apr 5, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.

Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit.

[newscientist.com...]

Petrogold

5+ Year Member



 
Msg#: 4109881 posted 3:23 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4109881 posted 6:41 am on Apr 5, 2010 (gmt 0)

But why Warden had to do this?

Because Facebook can afford the legal fees of fighting this, but be cannot.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4109881 posted 7:06 am on Apr 5, 2010 (gmt 0)

It's a real shame because a solid ruling on this topic would possibly change the way people crawl the web.

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4109881 posted 1:38 pm on Apr 5, 2010 (gmt 0)

Or only means that those with financial muscle get away with anything and everything no matter if they're right or wrong?


It's form of "Oligarchy" usually results from inadequate or complete lack of industry regulations and controls.

Hugene

10+ Year Member



 
Msg#: 4109881 posted 8:50 pm on Apr 5, 2010 (gmt 0)

Here is an interview with the guy behind the scraping.

[fastcompany.com...]

Isn't the content free and openly available? Shouldn't anyone be allowed to crawl it, as long as they don't reproduce it or sell it, I guess it is ok no?

JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4109881 posted 12:50 am on Apr 6, 2010 (gmt 0)

The content is public thus no crime was committed. Waste of resources.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4109881 posted 1:00 am on Apr 6, 2010 (gmt 0)

> not for this forum.

It is a story also about privacy and social media.

>robots

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court. It was never an accredited or adopted standard by any recognized standards body.

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

crobb305

WebmasterWorld Senior Member crobb305 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4109881 posted 2:51 am on Apr 6, 2010 (gmt 0)

Looks like Facebook just threatened to sue, and Warden caved in to avoid the hassle and expense. They never brought suit, and so no legal precedent was created.


The title of the thread is a little misleading if the suit was never filed. I am glad I saw your clarification when I was skimming the posts :)

Very interesting story, and it would have had interesting consequences if a precedent had been set.

graeme_p

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4109881 posted 9:40 am on Apr 6, 2010 (gmt 0)

Robots.txt is no defense of anything anywhere. It has never ONCE been upheld in court.


How many relevant cases have there been? Has a court actually ruled against "robots.txt said we could crawl?" as a defence?

FB's terms of service talk about robotic software being unacceptable way to connect. If they win on those grounds, that actually is a win for site owners.

So how are crawlers to know what sites to crawl? They can hardly parse the TOS!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4109881 posted 11:10 pm on Apr 6, 2010 (gmt 0)

Which is why it's time someone heavy forced a new standard of "robots.txt" that includes machine-readable versions of "what YOU cannot do".

I looked at the ACAP that someone around here suggested but I can't see it going anywhere without some big guys behind it. If W3 made it "legal" and there was then a legal case against something that ignored the TOS then perhaps google et al would adopt it (it would probably be google that ignore it...).

Until then, we're all stuffed. I can shout at scrapers and bots as much as I like but I can't afford to do anything about it - although I might try if there were a proper legal ruling.

shorebreak

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4109881 posted 2:46 pm on Apr 7, 2010 (gmt 0)

I'm a non-technical sales knucklehead. Please explain to me why y'all care about this topic. Is it because bots force you to incur costs you otherwise wouldn't have to bear?

tenerifejim

10+ Year Member



 
Msg#: 4109881 posted 7:55 pm on Apr 9, 2010 (gmt 0)

FB's terms of service talk about robotic software being unacceptable way to connect


But as far as I can tell it is not an unacceptable way to connect to the Google cache - which has all the same data. Although I may be wrong. In this TOS is there exception made for Google?

Also, isn't their entire API designed for this type of connection?

Make someone want to try the same thing and see it out in court...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved