In the last year, Alta, Ink, Google, and Fast have all four crawled the entire web. They certainly aren't putting all that data online and they for sure are not obeying robots.txt all the time. They send them out in hunter gather mode just to raid links and scarf up data. It is amazing what a wandering spider can run into some times. Mostly, they use the data to create link/web maps (eg: data mining operations).