Forum Moderators: open

Message Too Old, No Replies

IBM Crawler?

Anyone familiar with this

         

willybfriendly

8:11 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Logs show wfp2.almaden.ibm.com

Tracked it down and found this blurb - "The information we collect from the web is currently being used in IBM's Research Division for several search/indexing projects."

Anyone familiar with this and what projects IBM is working on.

WBF

wilderness

9:00 pm on Jun 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I let them back in recently as part of my "new leaf," after having their IP ranges denied for quite some time.

The result was fast and furious spidering.
I added them to robots as they have specified below.

They've never specified what exactly their research is.
They do however specify that they honor robots.txt in the following lines:

If you only want to forbid only our crawler from going through your
site, then create a robots.txt file that contains the following lines:

User-agent: http ://www.almaden.ibm.com/cs/crawler
Disallow: /

Please note, I've purposely left a blank space in the URL to keep the link non-active. If you use this line in robots, you'll need to remove that space.

At one time I "thought" this was the Compuserve bot. Compuserve is now a subsidy of the infamous AOL however some folks still have compuserver addresses and accounts. Perhaps somebody else can provide more insight.

littleman

9:08 pm on Jun 6, 2003 (gmt 0)



The link in their UA doesn't say much:
[almaden.ibm.com...]

This URL has some info on their newer projects:
[research.ibm.com...]

FWIW, they hit my geek site regularly.