Forum Moderators: open

Message Too Old, No Replies

Does anyone know this crawler?

My site got hit by Almaden.IBM.com/cs/Crawler...

         

erikv

11:43 am on Oct 23, 2002 (gmt 0)

10+ Year Member



Can anyone tell me if this is a "real" crawler or is this something experimental from the IBM Research Labs at Almaden?

Here's the URL that I saw in my logs:
[almaden.ibm.com...]

andreasfriedrich

11:49 am on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



site search for almaden [searchengineworld.com]

Andreas

Grumpus

11:57 am on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah. It seems to really be from IBM (Almaden). It was really active on my site earlier this summer and at the time, I guessed it was probably making a play for Yahoo in hopes of bouncing Google. At that time, others were reporting lots of activity by them, as well.

Recently, I haven't seen them much, but it's been "peekin' and pokin'", to use some of my favorite old tyme computer commands.

It seems Almaden is working on some sort of new fangled search algo's of some kind. From my foggy memory of what I managed to glean this summer, it seems like it might be some sort of combination of Natural Language (a la Ask Jeeves) and several other things that I can't remember right now. As of then, I remember thinking that it was a neat idea, but that it didn't look like anyone was interested. (Which is why I assumed they were looking to pitch to Yahoo). If we start seeing more activity in the next few weeks, it likely means that they're freshening up the database to pitch it to someone else that's new (or they've got a major new factor in the Algo and want to test it on new stuff).

In the end, though, it's a pretty agressive bot. I let it scuff the top level of my site just in case it picks up someone big, but keep it out of the deep pages until I know exactly what it is they're doing. It DOES seem to be somewhat smart, like the Googlebot, in that it'll back off if it notes that things are bogging down on your server.

G.

erikv

4:30 pm on Oct 23, 2002 (gmt 0)

10+ Year Member



>>I let it scuff the top level of my site just in case it picks up someone big, but keep it out >>>of the deep pages until I know exactly what it is they're doing.

Well, I can't do anything about as I depend on my ISP (Netgate--don't know if they're any good; their service/support is excellent, though) for allowing or disalloing anything at the server level.

What I have noticed is that Almaden is NOT responsible on my site at least, for big loads of traffic. I did notice a sudden increase of my log size that has been going on for about a week now, but when I analyzed the log Almaden couldn't be responsible.

Googlebot has paid me daily visits, the last week, but it hasn't resulted in a dramatic increase of traffic :(. I'm now at about 10,000 visitors a month and some 15,000 page views. After 3 months of trying hard to do what Brett has been saying on these forums all along, I find that disappointing. But that could be me, of course--I'm not experienced.

creative craig

4:38 pm on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<META NAME=robots CONTENT="index,nofollow"> will stop it going any deeper if you place it on your index page in the head tag.

erikv

4:44 pm on Oct 23, 2002 (gmt 0)

10+ Year Member



<META NAME=robots CONTENT="index,nofollow">

I thought that would also stop other bots, and I don't want to stop Googlebot! Now that I have his attention, I want it to be kept alive as long as possible ;).

creative craig

4:47 pm on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can add a robot.txt file and disalow it rom there. Very easy to set up, if you do a site search you will find lots of threads on them.

Craig

erikv

4:51 pm on Oct 23, 2002 (gmt 0)

10+ Year Member



I knew it was simple, and I even have a robots.txt file, but I thought that would also stop every robot in its paces. Anyway, I'm not unhappy with almaden's interest. It's just sorry that it won't bring in extra traffic...

john316

4:52 pm on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just an opinion here, but I wouldn't rule out IBM as a player in the search market.

They may not be a player today, but with the resources they are deploying to crawl (they are very aggresive), I have to think they are serious.

erikv

4:57 pm on Oct 23, 2002 (gmt 0)

10+ Year Member



>>Just an opinion here, but I wouldn't rule out IBM as a player in the search market.

I know they are very active in the search market on enterprise level, i.e. search functionality for intranets and for their Lotus Notes environment. But I didn't know they were also interested in Internet search. Their natural language stuff is said to be quite good, but that's probably too little too late: Gartner and Giga have recently been saying people really don't want to type in lengthy sentences in natural language anymore. They just want to see accurate result lists based on 3 search terms maximum...

Anyway, gentlemen, you all have been very helpful once more. It's 7 PM over here, and I'm going to log off, so you won't hear from me on this subject until tomorrow morning Wetsern European Time (summertime not included ;) ).

Thanks again and see you all tomorrow!
Erik

creative craig

5:00 pm on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have a look around [searchengineworld.com...] and there are some excellent tutorials on how to make a good robots.txt file.

Craig

carfac

3:06 am on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi:

My experience with this one is that it has not followed robot.txt, and has nosed around where it shouldn't. That, and it's very aggressive nature, and I stopped it through other means.

dave