homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
PRCrawler Emerges From Stealth Mode as Kindsight
ISPs to insert their own ads under guise of security software?
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3679120 posted 11:46 pm on Jun 19, 2008 (gmt 0)

Here's an emerging story that should concern webmasters and cause the alarms to sound IMO.

When I went to check on the status of Project Rialto today, which runs the PRCrawler, their site proclaimed they are now emerging as Kindsight.

Kindsight is a value-added services provider specializing in network-based security solutions for residential internet use. We deliver an always-on, always-up-to-date security service embedded in internet service providers’ networks to address what we call the “flawed malware defense cycle”.

Sounds good, right?

So why do they crawl?

OK, I won't keep you in suspense, it appears they crawl to "data mine" to find out what your site is about so they can target ads to your, um their, customers!

The Kindsight service, as with other free on-line applications such as search engines and map functions, is funded through an advertising mechanism but without the use of cookies, pop-ups or spam. Instead, we deliver ads on sites that are of interest to the subscriber base.

If they don't use pop-ups, how exactly are they going to show their customers ads?

Injecting interstitials?

Replacing ads in our pages?

Obviously we'll have to wait and see how this all works but the combination of a data mining crawler and injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm [webmasterworld.com]! :o

I can't tell others how to handle this but it's blocked on my server.

They appear to use Amazon Web Services (amazonaws.com) to crawl and this is their user agent:
"PRCrawler/Nutch-0.9 (data mining development project; crawler@projectrialto.com)"

This situation is another example of the "entitlement mentality" many of us talk about where people think they can do whatever they want with your websites and I'm sorry, if your service is that good CHARGE for it, I'd potentially pay for it, just leave my website alone.

 

johnnie

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3679120 posted 12:31 pm on Jun 20, 2008 (gmt 0)

Good thing they talk about a 'subscriber base', which would imply the system is opt-in. I'm guessing the spider is crawling pages in favor some form of companion spyware.

It's about time somebody instigated a massive class-action lawsuit on behalf of professional webmasters. This sort of behavious has to be exterminated from the roots up before it gets any foothold and becomes commonplace.

thecoalman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3679120 posted 2:07 pm on Jun 20, 2008 (gmt 0)

injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm!

This is my number one concern however the only mention I've heard of this actually occurring where it was confirmed was the Texas based ISP that "accidentally" enabled it using Nebuad. Was it ever confirmed that Phorm was replacing ads during there 2006 trial?

The other concern I have is as someone else pointed out in a previous thread is the indexing of private pages that are only accessible because they are piggy backing on the users login credentials.

Other than that I think this is mostly a consumer privacy issue.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3679120 posted 4:27 pm on Jun 20, 2008 (gmt 0)

Other than that I think this is mostly a consumer privacy issue.

When they plan to advertise along side or inside any website that doesn't belong to them I think it's also a webmaster issue because it's potentially money being redirected away from the content owner.

Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3679120 posted 4:30 pm on Jun 20, 2008 (gmt 0)

Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well.

How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken?

thecoalman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3679120 posted 5:14 pm on Jun 20, 2008 (gmt 0)

As I said ad injection or replacement is my number one concern. As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting. They can take their business elsewhere if they don't like it.

As far as a proactive approach this was posted in a previous thread: Detecting In-Flight Page Changes with Web Tripwires

[cs.washington.edu...]

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3679120 posted 5:25 pm on Jun 20, 2008 (gmt 0)

Detecting In-Flight Page Changes with Web Tripwires

thecoalman, thank you for that link! This is the first time I've seen discussion on Web Tripwires. I'm going to be discussing this with my programming team over the weekend. I like it! Not only from the perspective of this topic, but from a basic "monitoring" standpoint too.

We have detected that this page has been modified in flight. For more information, click here.

Oh, you just have to dig the terminology they use too. :)

thecoalman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3679120 posted 5:35 pm on Jun 20, 2008 (gmt 0)

Not sure who posted it originally, I just bookmarked it. ;)

Truthfully i don't think its needed unless you're trying to determine if the page was modified client side [webmasterworld.com]. If an ISP starts doing this you know its not going to take more than few minutes to make the front page here. :)

In the event they do start injecting ads or replacing the best solution IMO would be a blacklist. If millions of sites used it the ISP's would have no choice but to drop it.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3679120 posted 6:03 pm on Jun 20, 2008 (gmt 0)

Changing your site to 100% HTTPS (SSL) will easily stop in-flight page changes, unless it's a browser plug-in doing the changes.

However, note that your site performance will suffer and server CPU usage will increase quite a bit if you have a lot of visitors so it might require more hardware just to maintain your current grade of service.

[edited by: incrediBILL at 6:04 pm (utc) on June 20, 2008]

koan

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3679120 posted 6:56 pm on Jun 20, 2008 (gmt 0)

As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting.

As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc.

johnnie

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3679120 posted 9:03 pm on Jun 20, 2008 (gmt 0)

How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken?

SSL and / or MD5 hash comparison of client and server.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3679120 posted 2:05 am on Jun 21, 2008 (gmt 0)

As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc.

There are a multitude of bots that are utilized for 3rd party use which provide no benefit (i. e., our focused short-sight) to us as webamsters.
Three key examples that come to mind:
1) IBM Almaden which for a long time was harvesting materials under the guise of research, meanwhile, providing a dummy robots reference in their UA, while they were utilizing the harvested materials in a 3rd party paid intranet.
2) 131.107. has been doing the same basic thing since at least 2003 (there are some sparse earlier references).
3)Many Univeristies crawl under the guise of research without providing truthful details of their funded grants, when in fact the end-output is basically being turned over to the grant resource, while "we" are left believe the pretenseful pipe-dream that one of these "may be the next google."

There are many more abuses that many websites and webmasters are tolerating (at least that heed their visitor logs contents), while there are millions more websites and webmasters that could give two squats.
Some websites even utilize the phony numbers to raise their advertising fees.

Don

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved