| 9:57 pm on Oct 24, 2004 (gmt 0)|
from their website (returned on a google)
Convera.com is a leading provider of enterprise search and categorization solutions delivering categorization, dynamic classification, federated search, taxonomy development, taxonomy management and portal integrations for commercial and government customers around the world.
It appears to me as if they are spidering both for resale of their services and data to third parties?
What do I know!
| 10:04 pm on Oct 24, 2004 (gmt 0)|
I've also searched the 'Net (read: Google search) looking for info, but didn't find anything obviously definitive about them.
| 10:41 pm on Oct 24, 2004 (gmt 0)|
It was probably someone doing business intelligence gathering..it's always good to know what the competition is up to.
From one of their partner sites:
"applications including enterprise portals, knowledge management, intelligence gathering, profiling, corporate policy compliance, regulatory compliance, customer service"
| 11:09 pm on Oct 24, 2004 (gmt 0)|
|It appears to me as if they are spidering both for resale of their services and data to third parties? |
Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.
| 1:32 am on Oct 25, 2004 (gmt 0)|
|Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others. |
First, NONE of the three you mentioned above are spidering with the primary itenary of selling what they gather to a third party.
The aforementioned three offer public search engines of the data they gather. Covera offers no such free tool.
Second, Covera is not spidering my sites, which was effective on Monday the 18th.
[edited by: volatilegx at 3:11 pm (utc) on Oct. 25, 2004]
[edit reason] lets keep flames out of this [/edit]
| 2:03 am on Oct 25, 2004 (gmt 0)|
|Second, Covera is not spidering my sites, which was effective on Monday the 18th. |
I think you should give them a call and point out that the value of publicly available data that they have spidered now dropped dramatically -- please feel free to do just that as otherwise they might not even notice that loss of value ;)
And please no need to use insults - I merely pointed out that businesses such as Google, MSN and Yahoo exist for the sole purpose of "spidering both for resale of their services and data to third parties" - these, and not the provision of free services such as public search, are the stated objectives and legal obligations to shareholders of respective organisations.
| 1:36 pm on Oct 25, 2004 (gmt 0)|
So, in other words, we don't have a consensus about whether or not these folks should be added to our "Bad Robots" .htaccess lists?
Any other opinions, folks?
| 10:52 pm on Oct 25, 2004 (gmt 0)|
|we don't have a consensus |
your relatively new here.
As a rule this forum has not a whole either gathered or determined an consensus of good or bad.
Rather, it has been a tradition the each webmaster determines personally what bots are either beneficial or detrimental to their own sites (s).
The forum has in the past accepted some very off topic conversations in thread that were not part of the initial thread topic, rather solutions and/or preventions that culminated as a result of the same thread.
My own sites are quite unique. Many of the pages hold content which is not found any other place on the internet. As a result, the content of my pages are more crucal to a new or smaller bot than, that bot might provide benefits or visitors to my sites.
My htaccess denials (of APNIC, RIPE and LACNIC) are personal choices. Although I've had communication with other webmasters whom deny access to specific reigons.
| 11:24 pm on Oct 25, 2004 (gmt 0)|
| 9:51 am on Oct 26, 2004 (gmt 0)|
I have them all over my site last night using 22.214.171.124 and 126.96.36.199
My site has a lot of audio streams (we sell music) and this spider keeps hitting them all, my bandwidth is going crazy.
| 10:44 am on Oct 26, 2004 (gmt 0)|
|my bandwidth is going crazy |
SetEnvIf User-Agent ^Convera keep_out
| 2:15 pm on Oct 26, 2004 (gmt 0)|
Thanks for your input folks!
| 4:58 pm on Mar 20, 2008 (gmt 0)|
I'd like to vote for adding them to the bad bots list. We've caught them harvesting and reselling research from our professors on their web site, in violation of the copyright under which the research is published. In addition, they're not obeying the robots.txt. Our robots.txt cleary disallows them yet they spidered nearly 3000 pages from my site just yesterday.
| 5:03 pm on Mar 20, 2008 (gmt 0)|
Are we taking a vote some four years after the fact ;)
Each webmaster must decided what is beneficial or detrimental to their own website (s).
| 8:38 pm on Mar 24, 2008 (gmt 0)|
I find it more final to 403 ip deny them than robots.txt them. More satisfying :)
| 1:05 pm on Mar 25, 2008 (gmt 0)|
Ban them at the IP level and at the UA level.