Convera Crawler is Posting links. - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Convera Crawler is Posting links.

Jabba

7:55 pm on Jun 15, 2005 (gmt 0)

10+ Year Member

I have a php-nuke site and I was tracking down the source of someone that had posted a huge number of links in the comments area of my review section. Every review has the exact same links in the comments section.
I deleted most but here are 2 examples:
[****.***...]
[****.***...]
Looking through my Security Logs, aka Protector System, I was able to determine that this crawler was the only visitor on my site, at that specific time that could have posted those links.
UNITED STATES
Last here: 2005.06.13 06:53:57
Ip: 63.241.***.***
Isp/Host: 8-9745.san2.***.***
Last Referer: Direct Hit
Total Hits: 301
Was last on:/modules.php?name=Stories_Archive&sa=show_all
Agent infoConveraCrawler/0.8 (+http://www.authoritativeweb.com/crawl)

Other visitors that were on the site around that time only have one or two hits and they do not include my reviews section. Notice the 301 hits. Among those 301 hits are links like this:
[****.***...]

Has anyone seen this type of activity before from a bot or crawler?

[edited by: volatilegx at 9:16 pm (utc) on June 15, 2005]
[edit reason] removed specifics [/edit]

volatilegx

9:19 pm on Jun 15, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hi Jabba and welcome to WebmasterWorld :)

I haven't seen this type of behaviour from the Convera Crawler before, but this is very interesting.

Jabba

10:04 pm on Jun 15, 2005 (gmt 0)

10+ Year Member

Hi Jabba and welcome to WebmasterWorld happy!
I haven't seen this type of behaviour from the Convera Crawler before, but this is very interesting.

Well thanks for the welcome.
As I said in my PM to you, I am a lurker but I thought this would be very worthy of posting.
Frankly. I've never seen a bot/crawler visit the places this one did. From the evidence I have, I have no doubt that this crawler was responsible for posting those links.
This crawler seemed to be on the hunt for places where "anonymous" could post a comment as it hit Reviews, Stories, News, Sections, Topics and Forums only. The comments section for my reviews and one forum category is the only place that allows anonymous comments.
Sorry I didn't read the TOS. I assumed posting the evidentiary links would be permitted.

If more info is needed I will try to provide it.

wilderness

12:04 am on Jun 16, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I have the backbones (CERFnet) range denied.

Don

Jabba

8:15 pm on Jun 16, 2005 (gmt 0)

10+ Year Member

I spoke with an admin contact for this crawler and he sounded genuinely interested in finding out what may have caused this crawler to post those links.
Maybe he'll post a comment as I pointed him to this thread.

Jabba

7:22 pm on Jun 17, 2005 (gmt 0)

10+ Year Member

Digging a little deeper into this issue it seems that their crawler may have been hijacked by an IP addy in Belarus which I found in my server logs.
Their crawler was logged by Protector but the Belarus IP was not.
My server side logs show ConveraCrawler GET my robots.txt and then immediatley the Belarus IP began to POST the links.

Trojan or virus? Don't know but I do know these guys are new and they initially brought up the suggestion that they could have a malicious script embedded in the crawler code.

jmccormac

11:09 pm on Jun 17, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I've just been paged to come into the office at 23:30 on a Friday evening because Convera was hammering my directory and was not obeying the robots.txt exclusion. The IP in question is a Convera IP and it is going in the permanent deepsix list.

Looks like just another maggot until proven otherwise. I've emailed the contact address for an explanation. But I don't buy that hijack theory.

Regards...jmcc

wilderness

12:14 am on Jun 18, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

[convera.com...]

Funny thing?
I don't see any search option offered. Rather mentions of data retrieval and third party product references using websites as their reources.

jmccormac

2:38 am on Jun 18, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Funny thing?
I don't see any search option offered. Rather mentions of data retrieval and third party product references using websites as their reources.

Or in other words, a more organised webscraper? :)

Regards...jmcc

wilderness

4:46 am on Jun 18, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

jmcc,
There are actually many sources across the internet collecting data to sell or administer to third parties.

My gripe is "the why" of allowing them to utilize private resources (websites and bandwidth)without expecting compensation from either the "so called scraper" or their customers?

After all they are collecting the data to be utilized in a non-internet capacity. More like an intranet.

I feel the same way about univerities. And I do realize that much research (such as google and other projects) begins at universities. However, they have vaild resources in the way of grants with paid staff (professors) and students doing the majority of the work to further their career beyond the data they mine from privately owned web sites.

Another good example is Archive Org. It's an excellent resource and concept. The moog point, IMO, is that they will sell terabytes of collected data to anybody that wants to pay.
That payment concept is not under the theme of what most webmasters allow their site to be spidered.

The term "third party" is very broad and entials many companies not offering search engines. IBM Almaden is another example that only collects data to display in a closed enviroment to paid customers.

I have no desire to allow these types of bots or software's in my sites. UNLESS they are willing to send some compensation my way.
Of course they'd have to change their entire concept of doing business before that would happen ;)

Don