Been there, done that, had the panic, too... but then again, robots.txt is for everyone (or should be) and that's the way I roll on it (robots.txt). Get ugly with anything else on my site... well that's a different story! :)
Another benign explanation: It's someone in a programming class, assigned to make a crawler. They're not interested in your content, so they just pick up robots.txt because the crawler has to bring back something to prove it's been there. For appearances' sake they might also pick up the front page. And then you never see them again. Over on an unrelated forum, a couple of people I know were talking about an online programming class that one of them was taking and the other was thinking about taking. The one who was thinking about it said, quote, I have no interest in making a web-crawler; I want to handle files and data.
Confirming a long-held suspicion that that's where some of them are coming from.
Incidentally, while looking up this post I found a discussion about a legitimate mail server that unfortunately lives right in the middle of a blocked range. The site administrator had to, in his words, "poke a hole" in the middle of the block so people could get their e-mail. I know the feeling.
Depends on why blocked and how legit. I sometimes get clients say "You've blocked email from X" and it turns out the idiots are running a mail server on a broadband line - often static but it's still a supid idea when the whole range is in at least one good RBL. I'm the one who then has to "poke a hole" in the banned block.
I'm the one who then has to "poke a hole" in the banned block.
Although this may seem like a real PITA, I do it frequently to allow visitors from Euro ranges and are widget folks whom are referred to me. The only problem with these types of accommodations for RPIE and APNIC changes is that those countries Dynamic IP's change drastically from day to day (at least in most instances).
I have some sites that have a block on all comcast traffic. If comcast were South America, CN, KR or UA/RU they would be completely blocked. Terrible source of rubbish. Unfortunately one of my customers gets legit traffic from some of their users. :(
Lots of compromised machines about. This week's mail spam has doubled and there are a lot more bad site hits.
I've blocked 1060 comcast IPs over the past couple of years, ranging from single hits to a few dozen per IP. My server is a small-time operation so this is very big. If I get time perhaps I'll see if they easily group into IP ranges - and then block those ranges.
On the other hand, BT here in the UK has been blocked on over 1160 IPs - but UK is my normal market. Maybe I should address them as well.