I have been trying to stop e-mail harvesters by using a robots.txt file in my clients logs as we have been spammed pretty good recently. I was sent two virus mails, luckily I caught them but one client was not so lucky and it took down his e-mail program. So I do search the log files to find ones to exclude and add to the list. I have been seeing one for two weeks now that is identified as just - mail. Weird?
The raw log files show this: 18.104.22.168 - - [06/Mar/2001:12:39:28 -0800] "GET HTTP/1.0" 200 1438 "http://us.f100.mail.yahoo.com/ym/ShowLetter?YY=87438&order=down&sort=date" "Mozilla/4.73 [en] (Win98; U)
A robots.txt is not going to fend of an email harvesting spider. An email harvesting spider is just not going to look at it, and proceed to take your email addresses anyway. There have been written many different ways of making it difficult for email spiders to grab addresses, here: [webmasterworld.com...]
This is NOT an email harvester... it's someone with an @yahoo.com email address.
Person-1 sent an email with a link to your site to Personfirstname.lastname@example.org... Person-2 clicked on the link from within Yahoo mail, and left us.f100.mail.yahoo.com as the referrer in your logs.
If I were to visit your site from a link within MY Yahoo mailbox, I'd leave a referrer of us.f10.mail.yahoo.com. Yahoo mail accounts are handled by different numbered mail.yahoo.com machines. I used to be on us.f8.mail.yahoo.com... a few months ago, my mail changed to us.f10.mail.com... etc.