Forum Moderators: open
What can they get from these scans? m.html is the Bot Trap URL.
2002-04-18 09:15:30 <snip> Server <snip> Get /html/membership.html - 200 9684 HTTP/1.0 Mozilla/4.0 ompatible,+MSIE+5.5,+Windows+NT+5.0)+Fetch+API+Request - -
2002-04-18 09:16:02 <snip> Server <snip> Get /html/m.html - 200 9684 HTTP/1.0 Mozilla/4.0+(compatible,+MSIE+5.5,+Windows+NT+5.0)+Fetch+API+Request - -
and
2002-04-19 02:20:01 <snip> - SERVER <snip> GET /html/units/m.html - 200 10916 HTTP/1.0 Mozilla/4.0+(compatible,+MSIE=4.01,Windows+NT,+MS+Search+4.0+Robot)+Microsoft Webtrends XXXX
Some of the requests vary with:
+Microsoft+Scheduled+Cache+Content+Download+Service
(edited by: Brett_Tabke)
or
if you have limited visitors from the remainder of Asia (which BTW includes Austraila and NZ)
deny from 210.
deny from 211.
you might add in these also
61.248.0.0 - 61.255.255.255
61.96.0.0 - 61.111.255.255
hbarker: I don't think you have anything to worry about. I examined my logs. My website has information that may be of interest to Pentagon types on occasion; I had over 3,200 GETs in the last 20 days from *.mil domains. Of these, I had about 20 requests for my robots.txt. But looking at each of these, they are not spider-related at all. I can tell from the requests for GIFs, Java applets, and CGI searches whether it's a spider. These weren't spiders.
My best guess is that some bored Pentagon types are surfing, and some of them took a class in information warfare, and the teacher mentioned that by looking at the robots.txt you can see which directories are forbidden to spiders. This information is useful because it shows you the layout of the site to some extent, and clues you into which directories may be especially interesting (to info-warfare types).
Harmless fun, I suspect.