Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
Would this be the case with the others?
I've managed to track Nomad back to the Colorado State University .... but why would they be accessing our webmail pages? And these are the only 5 bots that are recorded as having accessed since last July (2003) when we started recording this activity.
We don't have a robots.txt file.
Welcome to WebmasterWorld [webmasterworld.com]!
You've provided the answer yourself:
Q. > why would they be accessing our webmail pages?
A. > We don't have a robots.txt file.
If a spider finds a link, it will follow that link, unless the resource (page, image, etc.) that the link leads to is disallowed in robots.txt *and* the spider is a good one that obeys robots.txt
Is there any way to tell if they can access the private information of our webmail subscribers? I'm concerned about security in this case. What sort of information is being gathered?
If one of your users has placed a link to the webmail directory somewhere on the 'net that a spider finds, it will crawl into the webmail directory. One link is all it takes.
As to why some spiders and not others, who can tell? Some spiders are more aggressive because their owners provide them with more bandwidth, processing power, and disk space so that they can dig deeper and retrieve more Web pages.
If your users' accounts are password-protected, and there are no "back-door" entries to bypass the password authorization, then you should be OK as far as their "personal" inforamtion being safe.
I'd strongly suggest you put up a robots.txt that disallows robots from your 'sensitive' areas, though. Being in control of the spiders, instead of at their mercy, is a good thing. Disallow the good spiders from sensitive areas of your site by using robots.txt, and block the bad spiders that don't read or obey robots.txt using other means (e.g. ISAPI filters on MS servers, mod_rewrite on Apache). (You define 'sensitive' - it varies from site to site.)
Out of your list of user-agents, the only one I'd allow without a lot more investigation would be GoogleBot.