Per the OP's Host info, if this is yet another PlanetLab (planet-lab.org) thing, too bad its "researchers at top academic institutions and industrial research labs" apparently disdain standard UA ID and bot-running activity. Hrrmph.
[edited by: incrediBILL at 12:11 am (utc) on May 24, 2010] [edit reason] Obscured IPs for HOWARD.EDU [/edit]
Dear Heart... do we expect anything different? The bots, bot-handlers, and those scraping the web think they can sell urls they think are merchandising worthy to the clueless seeking the holy grail of MFA hoping to make a killing in a saturated market where data is rapidly becoming mundane?
My fun thought is that no matter how big hard drives become, or how fast the connections, anyone attempting to "index the web" is butt stupid since there is way too much slop out there which is not worth a plug nickle. The real giggle is they have to spend tons of funds to play their games. We, as webmasters, spend our dollars in business/connection. And we can kill their business profile in .htaccess or similar, and they can't hurt us that much.
@thethrasher: Did those Hosts run the "ResearchProject" bot against your site(s)? (There are 1090 planetlab nodes at 503 sites world-wide.)
FWIW: There's no "ResearchProject" per se on the planet-lab.org site, either by name or an as active or inactive project. Unless we see it run from another planetlab-specific subdomain somewhere, it may just be someone's pass-around effort.
Regardless of who's running it, I'm curious to know what it tries to do when not 403'd from the get-go but for robots.txt, which it neglects to get.
As far I know lots of universities offer proxies to the students to connect from anyplace and access various services (Eg: scholar documents). This means if the browser or system of a student is compromised an outsider can use the proxy and it will be the institute server that shows up. I do see quite few unrelated requests in my server logs that are similar.