homepage Welcome to WebmasterWorld Guest from 54.147.196.159
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
ConveraCrawler
Is ConveraCrawler a malicious robot?
mattie

10+ Year Member



 
Msg#: 2582 posted 4:02 pm on Oct 24, 2004 (gmt 0)

I've recently noticed the robot ConveraCrawler/0.2 on my Web sites.

Does anyone have any information about this bot and who runs it? Is it malicious?

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 9:57 pm on Oct 24, 2004 (gmt 0)

from their website (returned on a google)

Convera.com is a leading provider of enterprise search and categorization solutions delivering categorization, dynamic classification, federated search, taxonomy development, taxonomy management and portal integrations for commercial and government customers around the world.

It appears to me as if they are spidering both for resale of their services and data to third parties?
What do I know!

mattie

10+ Year Member



 
Msg#: 2582 posted 10:04 pm on Oct 24, 2004 (gmt 0)

I've also searched the 'Net (read: Google search) looking for info, but didn't find anything obviously definitive about them.

fiestagirl

10+ Year Member



 
Msg#: 2582 posted 10:41 pm on Oct 24, 2004 (gmt 0)

It was probably someone doing business intelligence gathering..it's always good to know what the competition is up to.

From one of their partner sites:
"applications including enterprise portals, knowledge management, intelligence gathering, profiling, corporate policy compliance, regulatory compliance, customer service"

Lord Majestic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2582 posted 11:09 pm on Oct 24, 2004 (gmt 0)

It appears to me as if they are spidering both for resale of their services and data to third parties?

Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 1:32 am on Oct 25, 2004 (gmt 0)

Definately - they seem to be doing the same thing as Google, Yahoo, MSN and many others.

<snip>

First, NONE of the three you mentioned above are spidering with the primary itenary of selling what they gather to a third party.
The aforementioned three offer public search engines of the data they gather. Covera offers no such free tool.

Second, Covera is not spidering my sites, which was effective on Monday the 18th.

[edited by: volatilegx at 3:11 pm (utc) on Oct. 25, 2004]
[edit reason] lets keep flames out of this [/edit]

Lord Majestic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 2582 posted 2:03 am on Oct 25, 2004 (gmt 0)

Second, Covera is not spidering my sites, which was effective on Monday the 18th.

I think you should give them a call and point out that the value of publicly available data that they have spidered now dropped dramatically -- please feel free to do just that as otherwise they might not even notice that loss of value ;)

And please no need to use insults - I merely pointed out that businesses such as Google, MSN and Yahoo exist for the sole purpose of "spidering both for resale of their services and data to third parties" - these, and not the provision of free services such as public search, are the stated objectives and legal obligations to shareholders of respective organisations.

mattie

10+ Year Member



 
Msg#: 2582 posted 1:36 pm on Oct 25, 2004 (gmt 0)

*Ahem*

So, in other words, we don't have a consensus about whether or not these folks should be added to our "Bad Robots" .htaccess lists?

Any other opinions, folks?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 10:52 pm on Oct 25, 2004 (gmt 0)

we don't have a consensus

mattie,
your relatively new here.

As a rule this forum has not a whole either gathered or determined an consensus of good or bad.

Rather, it has been a tradition the each webmaster determines personally what bots are either beneficial or detrimental to their own sites (s).

The forum has in the past accepted some very off topic conversations in thread that were not part of the initial thread topic, rather solutions and/or preventions that culminated as a result of the same thread.

My own sites are quite unique. Many of the pages hold content which is not found any other place on the internet. As a result, the content of my pages are more crucal to a new or smaller bot than, that bot might provide benefits or visitors to my sites.

My htaccess denials (of APNIC, RIPE and LACNIC) are personal choices. Although I've had communication with other webmasters whom deny access to specific reigons.

Consensus? Hardly.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 11:24 pm on Oct 25, 2004 (gmt 0)

Previously:
216.143.234.135

Today:
63.241.61.80

herewego

10+ Year Member



 
Msg#: 2582 posted 9:51 am on Oct 26, 2004 (gmt 0)

I have them all over my site last night using 204.50.234.94 and 203.97.122.105
My site has a lot of audio streams (we sell music) and this spider keeps hitting them all, my bandwidth is going crazy.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 10:44 am on Oct 26, 2004 (gmt 0)

my bandwidth is going crazy

SetEnvIf User-Agent ^Convera keep_out

mattie

10+ Year Member



 
Msg#: 2582 posted 2:15 pm on Oct 26, 2004 (gmt 0)

Thanks for your input folks!

silentlamb

5+ Year Member



 
Msg#: 2582 posted 4:58 pm on Mar 20, 2008 (gmt 0)

I'd like to vote for adding them to the bad bots list. We've caught them harvesting and reselling research from our professors on their web site, in violation of the copyright under which the research is published. In addition, they're not obeying the robots.txt. Our robots.txt cleary disallows them yet they spidered nearly 3000 pages from my site just yesterday.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 5:03 pm on Mar 20, 2008 (gmt 0)

Are we taking a vote some four years after the fact ;)

Each webmaster must decided what is beneficial or detrimental to their own website (s).

Don

Megaclinium

5+ Year Member



 
Msg#: 2582 posted 8:38 pm on Mar 24, 2008 (gmt 0)

I find it more final to 403 ip deny them than robots.txt them. More satisfying :)

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 2582 posted 1:05 pm on Mar 25, 2008 (gmt 0)

Ban them at the IP level and at the UA level.

Regards...jmcc

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved