Welcome to WebmasterWorld Guest from 54.234.114.182

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

PRCrawler Emerges From Stealth Mode as Kindsight

ISPs to insert their own ads under guise of security software?

     
11:46 pm on Jun 19, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Here's an emerging story that should concern webmasters and cause the alarms to sound IMO.

When I went to check on the status of Project Rialto today, which runs the PRCrawler, their site proclaimed they are now emerging as Kindsight.

Kindsight is a value-added services provider specializing in network-based security solutions for residential internet use. We deliver an always-on, always-up-to-date security service embedded in internet service providers’ networks to address what we call the “flawed malware defense cycle”.

Sounds good, right?

So why do they crawl?

OK, I won't keep you in suspense, it appears they crawl to "data mine" to find out what your site is about so they can target ads to your, um their, customers!

The Kindsight service, as with other free on-line applications such as search engines and map functions, is funded through an advertising mechanism but without the use of cookies, pop-ups or spam. Instead, we deliver ads on sites that are of interest to the subscriber base.

If they don't use pop-ups, how exactly are they going to show their customers ads?

Injecting interstitials?

Replacing ads in our pages?

Obviously we'll have to wait and see how this all works but the combination of a data mining crawler and injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm [webmasterworld.com]! :o

I can't tell others how to handle this but it's blocked on my server.

They appear to use Amazon Web Services (amazonaws.com) to crawl and this is their user agent:
"PRCrawler/Nutch-0.9 (data mining development project; crawler@projectrialto.com)"

This situation is another example of the "entitlement mentality" many of us talk about where people think they can do whatever they want with your websites and I'm sorry, if your service is that good CHARGE for it, I'd potentially pay for it, just leave my website alone.

12:31 pm on June 20, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 17, 2004
posts:1354
votes: 0


Good thing they talk about a 'subscriber base', which would imply the system is opt-in. I'm guessing the spider is crawling pages in favor some form of companion spyware.

It's about time somebody instigated a massive class-action lawsuit on behalf of professional webmasters. This sort of behavious has to be exterminated from the roots up before it gets any foothold and becomes commonplace.

2:07 pm on June 20, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:877
votes: 0


injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm!

This is my number one concern however the only mention I've heard of this actually occurring where it was confirmed was the Texas based ISP that "accidentally" enabled it using Nebuad. Was it ever confirmed that Phorm was replacing ads during there 2006 trial?

The other concern I have is as someone else pointed out in a previous thread is the indexing of private pages that are only accessible because they are piggy backing on the users login credentials.

Other than that I think this is mostly a consumer privacy issue.

4:27 pm on June 20, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Other than that I think this is mostly a consumer privacy issue.

When they plan to advertise along side or inside any website that doesn't belong to them I think it's also a webmaster issue because it's potentially money being redirected away from the content owner.

Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well.

4:30 pm on June 20, 2008 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts: 12169
votes: 56


Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well.

How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken?

5:14 pm on June 20, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:877
votes: 0


As I said ad injection or replacement is my number one concern. As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting. They can take their business elsewhere if they don't like it.

As far as a proactive approach this was posted in a previous thread: Detecting In-Flight Page Changes with Web Tripwires

[cs.washington.edu...]

5:25 pm on June 20, 2008 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts: 12169
votes: 56


Detecting In-Flight Page Changes with Web Tripwires

thecoalman, thank you for that link! This is the first time I've seen discussion on Web Tripwires. I'm going to be discussing this with my programming team over the weekend. I like it! Not only from the perspective of this topic, but from a basic "monitoring" standpoint too.

We have detected that this page has been modified in flight. For more information, click here.

Oh, you just have to dig the terminology they use too. :)

5:35 pm on June 20, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:877
votes: 0


Not sure who posted it originally, I just bookmarked it. ;)

Truthfully i don't think its needed unless you're trying to determine if the page was modified client side [webmasterworld.com]. If an ISP starts doing this you know its not going to take more than few minutes to make the front page here. :)

In the event they do start injecting ads or replacing the best solution IMO would be a blacklist. If millions of sites used it the ISP's would have no choice but to drop it.

6:03 pm on June 20, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Changing your site to 100% HTTPS (SSL) will easily stop in-flight page changes, unless it's a browser plug-in doing the changes.

However, note that your site performance will suffer and server CPU usage will increase quite a bit if you have a lot of visitors so it might require more hardware just to maintain your current grade of service.

[edited by: incrediBILL at 6:04 pm (utc) on June 20, 2008]

6:56 pm on June 20, 2008 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2005
posts:1735
votes: 19


As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting.

As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc.

9:03 pm on June 20, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 17, 2004
posts:1354
votes: 0


How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken?

SSL and / or MD5 hash comparison of client and server.

2:05 am on June 21, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc.

There are a multitude of bots that are utilized for 3rd party use which provide no benefit (i. e., our focused short-sight) to us as webamsters.
Three key examples that come to mind:
1) IBM Almaden which for a long time was harvesting materials under the guise of research, meanwhile, providing a dummy robots reference in their UA, while they were utilizing the harvested materials in a 3rd party paid intranet.
2) 131.107. has been doing the same basic thing since at least 2003 (there are some sparse earlier references).
3)Many Univeristies crawl under the guise of research without providing truthful details of their funded grants, when in fact the end-output is basically being turned over to the grant resource, while "we" are left believe the pretenseful pipe-dream that one of these "may be the next google."

There are many more abuses that many websites and webmasters are tolerating (at least that heed their visitor logs contents), while there are millions more websites and webmasters that could give two squats.
Some websites even utilize the phony numbers to raise their advertising fees.

Don