I use awstats to keep an eye on where my traffic is coming from. A few weeks ago I noticed the infamous Unknown robot (identified by 'crawl') line reserved for those bots the program doesn't recognize. As the days went on this sucker generated 150 megs of traffic. I'll be the first to admit I am a newbie in this realm, so you can imagine the thoughts going through my head of some evil entity raping my website. I knew I needed to find out what the heck this was and then learn how to block it.
Thanks to the informative info here on WW, I learned I needed to open my access log and do some investigating. First off, tell me there is a better way to look at this file because my eyes are still bleeding from the wall of text. After about 30 minutes of searching and comparing I found a recurring line of text ~ pronto.com/robots.html. I was looking for ip addresses that were the same but this thing kept coming in from different ones, which is why it took me so long to figure it out. ;-)
Anyways, this bot has been taking the product information from my website and adding them to a comparisson shopping directory which many of my competitors are also on. I checked the areas the bot had visited and sure enough, my products are listed. The site seems to be in beta so I guess they are just seeding it. Either way it's nice to get something free for a change. I'll have to keep an eye out for how much referring traffic/sales I get from it.