homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
ConveraCrawler
Is ConveraCrawler a malicious robot?
mattie




msg:405809
 4:02 pm on Oct 24, 2004 (gmt 0)

I've recently noticed the robot ConveraCrawler/0.2 on my Web sites.

Does anyone have any information about this bot and who runs it? Is it malicious?

 

wilderness




msg:405810
 9:57 pm on Oct 24, 2004 (gmt 0)

from their website (returned on a google)

Convera.com is a leading provider of enterprise search and categorization solutions delivering categorization, dynamic classification, federated search, taxonomy development, taxonomy management and portal integrations for commercial and government customers around the world.

It appears to me as if they are spidering both for resale of their services and data to third parties?
What do I know!

mattie




msg:405811
 10:04 pm on Oct 24, 2004 (gmt 0)

I've also searched the 'Net (read: Google search) looking for info, but didn't find anything obviously definitive about them.

fiestagirl




msg:405812
 10:41 pm on Oct 24, 2004 (gmt 0)

It was probably someone doing business intelligence gathering..it's always good to know what the competition is up to.

From one of their partner sites:
"applications including enterprise portals, knowledge management, intelligence gathering, profiling, corporate policy compliance, regulatory compliance, customer service"

Lord Majestic




msg:405813
 11:09 pm on Oct 24, 2004 (gmt 0)

It appears to me as if they are spidering both for resale of their services and data to third parties?

Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.

wilderness




msg:405814
 1:32 am on Oct 25, 2004 (gmt 0)

Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.

<snip>

First, NONE of the three you mentioned above are spidering with the primary itenary of selling what they gather to a third party.
The aforementioned three offer public search engines of the data they gather. Covera offers no such free tool.

Second, Covera is not spidering my sites, which was effective on Monday the 18th.

[edited by: volatilegx at 3:11 pm (utc) on Oct. 25, 2004]
[edit reason] lets keep flames out of this [/edit]

Lord Majestic




msg:405815
 2:03 am on Oct 25, 2004 (gmt 0)

Second, Covera is not spidering my sites, which was effective on Monday the 18th.

I think you should give them a call and point out that the value of publicly available data that they have spidered now dropped dramatically -- please feel free to do just that as otherwise they might not even notice that loss of value ;)

And please no need to use insults - I merely pointed out that businesses such as Google, MSN and Yahoo exist for the sole purpose of "spidering both for resale of their services and data to third parties" - these, and not the provision of free services such as public search, are the stated objectives and legal obligations to shareholders of respective organisations.

mattie




msg:405816
 1:36 pm on Oct 25, 2004 (gmt 0)

*Ahem*

So, in other words, we don't have a consensus about whether or not these folks should be added to our "Bad Robots" .htaccess lists?

Any other opinions, folks?

wilderness




msg:405817
 10:52 pm on Oct 25, 2004 (gmt 0)

we don't have a consensus

mattie,
your relatively new here.

As a rule this forum has not a whole either gathered or determined an consensus of good or bad.

Rather, it has been a tradition the each webmaster determines personally what bots are either beneficial or detrimental to their own sites (s).

The forum has in the past accepted some very off topic conversations in thread that were not part of the initial thread topic, rather solutions and/or preventions that culminated as a result of the same thread.

My own sites are quite unique. Many of the pages hold content which is not found any other place on the internet. As a result, the content of my pages are more crucal to a new or smaller bot than, that bot might provide benefits or visitors to my sites.

My htaccess denials (of APNIC, RIPE and LACNIC) are personal choices. Although I've had communication with other webmasters whom deny access to specific reigons.

Consensus? Hardly.

wilderness




msg:405818
 11:24 pm on Oct 25, 2004 (gmt 0)

Previously:
216.143.234.135

Today:
63.241.61.80

herewego




msg:405819
 9:51 am on Oct 26, 2004 (gmt 0)

I have them all over my site last night using 204.50.234.94 and 203.97.122.105
My site has a lot of audio streams (we sell music) and this spider keeps hitting them all, my bandwidth is going crazy.

wilderness




msg:405820
 10:44 am on Oct 26, 2004 (gmt 0)

my bandwidth is going crazy

SetEnvIf User-Agent ^Convera keep_out

mattie




msg:405821
 2:15 pm on Oct 26, 2004 (gmt 0)

Thanks for your input folks!

silentlamb




msg:3606506
 4:58 pm on Mar 20, 2008 (gmt 0)

I'd like to vote for adding them to the bad bots list. We've caught them harvesting and reselling research from our professors on their web site, in violation of the copyright under which the research is published. In addition, they're not obeying the robots.txt. Our robots.txt cleary disallows them yet they spidered nearly 3000 pages from my site just yesterday.

wilderness




msg:3606514
 5:03 pm on Mar 20, 2008 (gmt 0)

Are we taking a vote some four years after the fact ;)

Each webmaster must decided what is beneficial or detrimental to their own website (s).

Don

Megaclinium




msg:3609477
 8:38 pm on Mar 24, 2008 (gmt 0)

I find it more final to 403 ip deny them than robots.txt them. More satisfying :)

jmccormac




msg:3609944
 1:05 pm on Mar 25, 2008 (gmt 0)

Ban them at the IP level and at the UA level.

Regards...jmcc

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved