homepage Welcome to WebmasterWorld Guest from 54.211.47.170
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 78 message thread spans 3 pages: < < 78 ( 1 2 [3]     
Digsby IM Enables Web Crawlers Control of Your PC & Bandwidth
Plura Processing and 80Legs to Leverage Digsby Network
incrediBILL




msg:3986024
 3:24 pm on Sep 8, 2009 (gmt 0)

This is slightly complicated so follow along and read the entire post to grasp the full impact of this situation.

Digsby

Let's start at the beginning with this company called Digsby that creates this cutesy IM tool that is so cute many will just have to install it.

The problem is that Digsby has something built-in that allows your computer to become part of some idle CPU processing computing network.

Do you read all that fine print?

Most people don't, they skip through it NEXT NEXT NEXT just install this thing.

Here's the fun part of the "Digsby Research Module":

[wiki.digsby.com...]
The module turns on after your computer has been completely idle for 5 minutes (no mouse or keyboard movement). It then turns off the instant you move your mouse or the press a key on the keyboard

Basically, if you install Digsby, they can hijack your CPU idle time for fun and profit including WEB CRAWLING!

Here's what they say right in their TOS:

[digsby.com...]
15. USAGE OF COMPUTER RESOURCES.

You agree to permit the Software to use the processing power of your computer when it is idle to run downloaded algorithms (mathematical equations) and code within a process. You understand that when the Software uses your computer, it likewise uses your CPU, bandwidth, and electrical power. The Software will use your computer to solve distributed computing problems, such as but not limited to, accelerating medical research projects, analyzing the stock market, searching the web, and finding the largest known prime number. This functionality is completely optional and you may disable it at any time.

Of course they like to wrap themselves in charitable terms such as cancer research, that must be a good thing, no?

Emphasis on stock market analysis and web search is mine, far cry from cancer research huh?

Some people really don't like Digsby:

[lifehacker.com...]
It Gets Even Worse: Your PC is Being Used Without Your Knowledge

You can debate the merits of bundled crapware, and brush away the despicable nature of preying on those lacking adequate tech skills, but did you realize that Digsby is also using your processor to make money?

Plura Processing

These guys are building out monetization methods for the Digsby network.

[pluraprocessing.wordpress.com...]
80legs is a good customer to talk about as an example because they’ve taken the compute power we give them, and they’ve built something pretty cool on top. 80legs is itself a startup, and they provide a Web-scale crawling and processing service.

Disclosure: Plura and 80legs share an investor, and 80legs has been of great help to us as a guinea pig

80legs

Lets you crawl up to 2 billion pages a day using the PCs of less than savvy computer owners.

[80legs.com...]
80legs runs on a 50,000-node grid computer. This means we have a whole lot of bandwidth and compute power for you to use. The system as a whole can crawl up to 2 billion pages per day. Our unique architecture gives us (and our users) inherent advantages when it comes to crawling the web.

Do the math here:

2B pages per day / 50K computers = 40K pages per computer per day!

Assuming average web pages are about 20K these days that's 800MB downloaded per PC per day and if you include images, flash files and pdf's in this crawl using way over 1GB per PC per day is trivial.

Potential Consumer Impact

Considering most cable companies now have a fixed cap on your usage or if you're using a wireless broadband card that has a 5GB cap and no longer offers unlimited data, people are going to be paying for this usage.

Rogers Cable in Canada for instance has a 60GB cap but you can order lower bandwidth plans for Grandma called the "Ultra Lite" with a 2 GB monthly cap and $5.00 per additional GB. Imagine when Grandma, someone that probably has a very idle computer, installs Digsby and has a potential $150 excess bandwidth bill the next month! Grandma will definitely need her blood pressure medicine increased.

Potential Webmaster Concerns

Home computer users with Digsby installed may suddenly find their access restricted to many websites. The problem here is bot blocking software may already be temporarily suspending access to sites for the PCs of these hapless users and if 80legs is successful, the bot blocking battles will shift from data centers to actual home PCs, a massive transition in mind share in the bot blocking world.

This isn't just theory, it's already happened on some of my own sites. A couple of visitors wrote wanting to know why they were being restricted and I sent them a log file of a high speed crawl of 100s of pages and they denied any knowledge of this activity. While we don't know the source of this crawl yet, this is an example of what you can potentially expect moving forward if you have any anti-DOS software running on your site and 80legs comes knocking.

More importantly, stealth crawling will have reached a new pinnacle of unlimited penetration never before thought possible thanks to 80legs and Digsby's software.

If the experience with Amazon Web Services [webmasterworld.com] can be used as a guideline, I can foresee collecting and distributing lists of Digsby's and 80legs customers for webmasters to block may be in near the future.

Guess we'll have to wait and see what happens.

 

wilderness




msg:3988009
 3:50 pm on Sep 11, 2009 (gmt 0)

and one can safely type the names of nasty software

My reason for adding the obscurity was because one of the domains has be re-assigned and is currently active.

shiondev




msg:3988017
 4:11 pm on Sep 11, 2009 (gmt 0)

..and the actual initiation of your policy hasn't happened yet ..

Actually, Plura has sent out a notice to their affiliates that notifies them of the change in the TOS. As I mentioned before, Digsby is working on a new installer. I don't know enough about the goings-on to know when that will be coming, but I have no reason to believe it won't be soon.

your installations will trip all the AV's

I guess it's necessary to repeat: Plura isn't an installed application.

On a side note.. while Plura may not be curing cancer right now (hopefully it will help one day), it is helping fight poverty. Some of its affiliates are charityware applications, which allow converting CPU cycles into charitable donations. The biggest one right now is Superdonate.

[edited by: incrediBILL at 4:53 pm (utc) on Sep. 11, 2009]
[edit reason] removed URL, see TOS #13 [webmasterworld.com...] [/edit]

Leosghost




msg:3988038
 4:48 pm on Sep 11, 2009 (gmt 0)

Another plura engined botnet..and you have to install it ( or write to them ) to see where the money goes and how much of it goes to charity ..and again it mentions medical research ( cue violins )..but slithers around saying which, for whom etc ..

I guess it's necessary to repeat: Plura isn't an installed application.

Sure seems to need that some "cough" "aff" gets you to install something devious and weasely worded to let it run though ..

[edited by: incrediBILL at 5:19 pm (utc) on Sep. 11, 2009]
[edit reason] clean up [/edit]

incrediBILL




msg:3988044
 5:00 pm on Sep 11, 2009 (gmt 0)

My real concern is this whole network of sites could easily become a national security problem.

If someone hacks the code they distribute and figures out how to make each Digsby PC generate an HTTP request, finding the IPs of most of the Digsby network will be childs play thanks to 80legs broadcasting their IPs when it crawls.

Therefore, if someone can figure out how to breach their protocol and bring the Digsby botnet under their control, you could easily use 50K to 200K PCs to bring down some serious financial services or worse.

What's even more troublesome is the typical type of user that would install Digsby in the first place usually isn't too discriminating about what they install or too savvy about their PC and probably have an infected PC already so even if it's not the Digsby network causing the problem.

If another botnet deploys code that can detect machines with Digsby installed they could actually fake taking over the 80legs crawler and launch an attack spoofing 80legs user agent as the source of the attack.

I sure wouldn't want to be the one grilled after such a botnet spoof attack and try to explain to the FBI, NSA and DHS that it wasn't me!

[edited by: incrediBILL at 6:00 pm (utc) on Sep. 11, 2009]

shiondev




msg:3988068
 5:27 pm on Sep 11, 2009 (gmt 0)

Woohoo, just got word that Digsby plans on updating their installer on the 17th of this month.

The new installer will notify the user about Plura during the install process and let user specify bandwidth % to use (as well as CPU %).

incrediBILL




msg:3988089
 6:00 pm on Sep 11, 2009 (gmt 0)

The new installer will notify the user about Plura during the install process and let user specify bandwidth % to use (as well as CPU %).

That's a bit more legitimacy for sure.

However, what about the 50K - 200K machines already in the network?

shiondev




msg:3988123
 6:47 pm on Sep 11, 2009 (gmt 0)

I know Digsby has a way of communicating directly with all their users; perhaps they will use that feature to notify them.

tangor




msg:3988257
 1:50 am on Sep 12, 2009 (gmt 0)

The more I read this the more ticked I become:

1. 80legs has customers who want crawling done.
2. Plura has affiliates
3. Somewhere in the mix CPU cycles are converted to donations

Money is made in all three above.

What do the users get? Higher electrical bills, potential for bandwidth costs
What do the webmasters get? Nada benefit of new visitors, higher electrical bills, potential for bandwidth costs.

No money is made AND there are COSTS INCURRED for no benefits.

Gotta love these "redistribute the wealth" programs. This one makes sure everybody else bears the work and costs instead of those who benefit from it.

VOLUNTEERING systems and resources is one thing.
VOLUNTEERING systems and resources for another's profit is something else.
OBTAINING use of systems and resources via "freebie" software is one thing (and it's not "free")
OBTAINING use of systems and resources without providing benefit to the webmasters is despicable.

The "clarity" of 80legs business is about a clear as Mississippi mud to me, and that, along with all the above, will be banned.

I don't like bot nets. I particularly dislike commercial bot nets. Especially those which do not directly benefit me as a webmaster in some form or another.

incrediBILL




msg:3997306
 9:22 pm on Sep 28, 2009 (gmt 0)

I'm seeing 80legs (failed) attempts to start crawling my sites coming from all the major ISPs including Charter, Qwest, Comcast, Comcastbusiness, and Verizon.

Let the chaos commence.

wilderness




msg:3999029
 1:37 am on Oct 1, 2009 (gmt 0)

Thirty-four (34) IP's in fourteen hours today and on only one website.

All 403's.

That last eleven IP's all requested the same page within the most liberal directory on my websites and still ate 403's.

Pfui




msg:3999287
 2:29 pm on Oct 1, 2009 (gmt 0)

Same here as to sudden semi-amuck activity from all over; still using the badly-formatted:

Mozilla/5.0 (compatible; 008/0.83; [80legs.com...] Gecko/2008032620

At least asking for robots.txt at this end.

dstiles




msg:3999498
 9:23 pm on Oct 1, 2009 (gmt 0)

Robots.txt or not, it's wandering around my sites on prosthetics since I've broken all its legs. One hit and it's chopped off at the knees and its IP killed dead. Shame: I'd probably quite like the IPs if I met them socially. :)

keyplyr




msg:3999570
 12:51 am on Oct 2, 2009 (gmt 0)


Hmmm, interesting. I just disallowed their UA in robots.txt and they come, read it, obey and leave. No problems yet (that I've caught anyway.)

User-agent: 008
Disallow: /

71.62.229.** - - [30/Sep/2009:09:04:33 -0700] "GET /robots.txt HTTP/1.1" 200 4219 "-" "Mozilla/5.0 (compatible; 008/0.83; [80legs.com...] Gecko/2008032620"

Pfui




msg:4005689
 5:43 pm on Oct 12, 2009 (gmt 0)

(Have we lost shiondev?)

This just in: A new, apparently digsby-related, robots.txt-ignoring thing:

12.200.197.nn
digsby-asynchttp/0.1

robots.txt? NO

All it took, all it requested, was favicon.ico. WHOIS says the Host IP (Ruralinet) is in Miami, OK.

Pfui




msg:4016506
 8:09 pm on Oct 30, 2009 (gmt 0)

75-92-228-nn.sea.clearwire-dns.net
digsby-asynchttp/0.1

robots.txt? NO

Again just hit favicon.ico.

Hmm. This appears to be the so-called "favicon fetcher" referred to by a "Digsby Developer" named "mike" on their Digsby Forum almost exactly a year ago. (2008-11-06; Google the UA -- top result; scroll down.)

So they're fetching favicons -- why?

Pfui




msg:4022924
 5:44 am on Nov 11, 2009 (gmt 0)

One site, ~90 mins, 20 robots.txt hits by:

Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/spider.html;) Gecko/2008032620

.available.above.net
19:22:46
19:24:55
19:27:06
19:38:44
19:39:03
19:39:50
19:41:13
19:41:42
19:42:20
19:48:10
19:58:36

.hsd1.ma.comcast.net
19:33:56

.sttlwa.fios.verizon.net
19:44:58

.hsd1.il.comcast.net
19:56:24
19:56:25

.hsd1.co.comcast.net
20:05:26

.hsd1.ca.comcast.net
20:13:58

.hsd1.ca.comcast.net
20:22:24

.hsd1.pa.comcast.net
20:30:14

.hsd1.nj.comcast.net
20:38:51

I hate botnets.

Pfui




msg:4055752
 6:42 am on Jan 7, 2010 (gmt 0)

Same old, same old hit yet another site today:

12.200.197.9*
digsby-asynchttp/0.1

robots.txt? NO
URI: favicon.ico

Pfui




msg:4084287
 1:58 pm on Feb 21, 2010 (gmt 0)

See also: 80legs [webmasterworld.com]

This 78 message thread spans 3 pages: < < 78 ( 1 2 [3]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved