from their website (returned on a google)
Convera.com is a leading provider of enterprise search and categorization solutions delivering categorization, dynamic classification, federated search, taxonomy development, taxonomy management and portal integrations for commercial and government customers around the world.
It appears to me as if they are spidering both for resale of their services and data to third parties?
What do I know!
I've also searched the 'Net (read: Google search) looking for info, but didn't find anything obviously definitive about them.
It was probably someone doing business intelligence gathering..it's always good to know what the competition is up to.
From one of their partner sites:
"applications including enterprise portals, knowledge management, intelligence gathering, profiling, corporate policy compliance, regulatory compliance, customer service"
|It appears to me as if they are spidering both for resale of their services and data to third parties? |
Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.
|Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others. |
First, NONE of the three you mentioned above are spidering with the primary itenary of selling what they gather to a third party.
The aforementioned three offer public search engines of the data they gather. Covera offers no such free tool.
Second, Covera is not spidering my sites, which was effective on Monday the 18th.
[edited by: volatilegx at 3:11 pm (utc) on Oct. 25, 2004]
[edit reason] lets keep flames out of this [/edit]
|Second, Covera is not spidering my sites, which was effective on Monday the 18th. |
I think you should give them a call and point out that the value of publicly available data that they have spidered now dropped dramatically -- please feel free to do just that as otherwise they might not even notice that loss of value ;)
And please no need to use insults - I merely pointed out that businesses such as Google, MSN and Yahoo exist for the sole purpose of "spidering both for resale of their services and data to third parties" - these, and not the provision of free services such as public search, are the stated objectives and legal obligations to shareholders of respective organisations.
So, in other words, we don't have a consensus about whether or not these folks should be added to our "Bad Robots" .htaccess lists?
Any other opinions, folks?
|we don't have a consensus |
your relatively new here.
As a rule this forum has not a whole either gathered or determined an consensus of good or bad.
Rather, it has been a tradition the each webmaster determines personally what bots are either beneficial or detrimental to their own sites (s).
The forum has in the past accepted some very off topic conversations in thread that were not part of the initial thread topic, rather solutions and/or preventions that culminated as a result of the same thread.
My own sites are quite unique. Many of the pages hold content which is not found any other place on the internet. As a result, the content of my pages are more crucal to a new or smaller bot than, that bot might provide benefits or visitors to my sites.
My htaccess denials (of APNIC, RIPE and LACNIC) are personal choices. Although I've had communication with other webmasters whom deny access to specific reigons.
I have them all over my site last night using 188.8.131.52 and 184.108.40.206
My site has a lot of audio streams (we sell music) and this spider keeps hitting them all, my bandwidth is going crazy.
|my bandwidth is going crazy |
SetEnvIf User-Agent ^Convera keep_out
Thanks for your input folks!
I'd like to vote for adding them to the bad bots list. We've caught them harvesting and reselling research from our professors on their web site, in violation of the copyright under which the research is published. In addition, they're not obeying the robots.txt. Our robots.txt cleary disallows them yet they spidered nearly 3000 pages from my site just yesterday.
Are we taking a vote some four years after the fact ;)
Each webmaster must decided what is beneficial or detrimental to their own website (s).
I find it more final to 403 ip deny them than robots.txt them. More satisfying :)
Ban them at the IP level and at the UA level.