|PRCrawler Emerges From Stealth Mode as Kindsight|
ISPs to insert their own ads under guise of security software?
Here's an emerging story that should concern webmasters and cause the alarms to sound IMO.
When I went to check on the status of Project Rialto today, which runs the PRCrawler, their site proclaimed they are now emerging as Kindsight.
|Kindsight is a value-added services provider specializing in network-based security solutions for residential internet use. We deliver an always-on, always-up-to-date security service embedded in internet service providers’ networks to address what we call the “flawed malware defense cycle”. |
Sounds good, right?
So why do they crawl?
OK, I won't keep you in suspense, it appears they crawl to "data mine" to find out what your site is about so they can target ads to your, um their, customers!
If they don't use pop-ups, how exactly are they going to show their customers ads?
Replacing ads in our pages?
Obviously we'll have to wait and see how this all works but the combination of a data mining crawler and injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm [webmasterworld.com]! :o
I can't tell others how to handle this but it's blocked on my server.
They appear to use Amazon Web Services (amazonaws.com) to crawl and this is their user agent:
"PRCrawler/Nutch-0.9 (data mining development project; firstname.lastname@example.org)"
This situation is another example of the "entitlement mentality" many of us talk about where people think they can do whatever they want with your websites and I'm sorry, if your service is that good CHARGE for it, I'd potentially pay for it, just leave my website alone.
Good thing they talk about a 'subscriber base', which would imply the system is opt-in. I'm guessing the spider is crawling pages in favor some form of companion spyware.
It's about time somebody instigated a massive class-action lawsuit on behalf of professional webmasters. This sort of behavious has to be exterminated from the roots up before it gets any foothold and becomes commonplace.
|injected ads at the local or ISP level is quite alarming and almost sounds like another incarnation of Phorm! |
This is my number one concern however the only mention I've heard of this actually occurring where it was confirmed was the Texas based ISP that "accidentally" enabled it using Nebuad. Was it ever confirmed that Phorm was replacing ads during there 2006 trial?
The other concern I have is as someone else pointed out in a previous thread is the indexing of private pages that are only accessible because they are piggy backing on the users login credentials.
Other than that I think this is mostly a consumer privacy issue.
|Other than that I think this is mostly a consumer privacy issue. |
When they plan to advertise along side or inside any website that doesn't belong to them I think it's also a webmaster issue because it's potentially money being redirected away from the content owner.
Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well.
|Such practices will spell the beginning of the end for small advertising-based sites and put a dent in larger sites as well. |
How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken?
As I said ad injection or replacement is my number one concern. As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting. They can take their business elsewhere if they don't like it.
As far as a proactive approach this was posted in a previous thread: Detecting In-Flight Page Changes with Web Tripwires
Detecting In-Flight Page Changes with Web Tripwires
thecoalman, thank you for that link! This is the first time I've seen discussion on Web Tripwires. I'm going to be discussing this with my programming team over the weekend. I like it! Not only from the perspective of this topic, but from a basic "monitoring" standpoint too.
|We have detected that this page has been modified in flight. For more information, click here. |
Oh, you just have to dig the terminology they use too. :)
Not sure who posted it originally, I just bookmarked it. ;)
Truthfully i don't think its needed unless you're trying to determine if the page was modified client side [webmasterworld.com]. If an ISP starts doing this you know its not going to take more than few minutes to make the front page here. :)
In the event they do start injecting ads or replacing the best solution IMO would be a blacklist. If millions of sites used it the ISP's would have no choice but to drop it.
Changing your site to 100% HTTPS (SSL) will easily stop in-flight page changes, unless it's a browser plug-in doing the changes.
However, note that your site performance will suffer and server CPU usage will increase quite a bit if you have a lot of visitors so it might require more hardware just to maintain your current grade of service.
[edited by: incrediBILL at 6:04 pm (utc) on June 20, 2008]
|As far that is concerned any ISP injecting or replacing ads that appear within the same browser window will be banned from my server including the few clients I'm hosting. |
As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc.
|How does one combat this? What proactive approach should a Professional Webmaster take? And also, what reactive approach would need to be taken? |
SSL and / or MD5 hash comparison of client and server.
|As long as some crawler doesn't provide any *benefits* to the publisher, such as traffic, I don't see why anyone would want to grant them access just so it can eat more bandwidth and make money off it. I'm also thinking of trademark/copyright checker, etc. |
There are a multitude of bots that are utilized for 3rd party use which provide no benefit (i. e., our focused short-sight) to us as webamsters.
Three key examples that come to mind:
1) IBM Almaden which for a long time was harvesting materials under the guise of research, meanwhile, providing a dummy robots reference in their UA, while they were utilizing the harvested materials in a 3rd party paid intranet.
2) 131.107. has been doing the same basic thing since at least 2003 (there are some sparse earlier references).
3)Many Univeristies crawl under the guise of research without providing truthful details of their funded grants, when in fact the end-output is basically being turned over to the grant resource, while "we" are left believe the pretenseful pipe-dream that one of these "may be the next google."
There are many more abuses that many websites and webmasters are tolerating (at least that heed their visitor logs contents), while there are millions more websites and webmasters that could give two squats.
Some websites even utilize the phony numbers to raise their advertising fees.