Spammers somehow found my clients' customer logfile

I put a contact form on a client's website, and had it also write senders' email addresses to a logfile. (The client hosts their own email, which is unreliable, and they wanted to be able to see if there were entries in the logfile for addresses from which they never got a message.)

I didn't think the logfile would be visible to the outside world, because it was inside a directory that already had an index.html file. In other words, the directory contents were:

/directory/index.html
/directory/log.txt
/directory/script.cgi

My understanding is that if a directory has an index.html file, then any request for the directory itself makes the server return index.html, so the requester can't see the contents of the directory.

That's nice in theory, but somehow spammers got to the logfile. I first noticed this when the special address I used for the client, which I also used to test the contact form, started getting spam. I did a Google search for that address and nothing showed up, so I assumed that the client's computer was infected with spyware which was stealing addresses from their addressbook and sending them to the spammer.

But some months later, one of the client's customers also started getting spam, and the customer Googled that address, and the logfile came up in Google! It also comes up in Yahoo. I think spammers found the logfile before the search engines (since I started getting spam before I could find the logfile in Google), but I can't be sure.

I can't explain how this happened. There are only four ways I know of that a bot (friendly or no) can find a file:

(1) The file is linked to from a page. But I certainly didn't link to the logfile from anywhere, and I don't think the client did, either. I searched for backlinks to the logfile in the SE's and found nothing.

(2) The file is in a directory without a default file like index.html or index.htm. But there is indeed an index.html file, and I don't think it was ever deleted.

(3) The bot guesses at the filename. I think it's a stretch that spambots are going to query every directory they come across for "log.txt", but even if they did, how does that explain how *Google* and *Yahoo* found the logfile? Certainly Google and Yahoo aren't playing guessing games.

(4) Submitting the url directly to a SE. Obviously this didn't happen.

Lessons learned:

1. Unless there is a compelling reason to store email addresses in the webspace, don't. I could have easily written the logfile above the webspace (e.g., to /home/log.txt, instead of to /home/domain.com/directory/log.txt), and in hindsight, I should have.

2. If it's really necessary to store email addresses within the webspace (e.g., a client app to access the data via the web), put it in a secure directory. The directory where the data is stored should always prompt visitors for a username/password in order to see the contents.

3. For added safety, it wouldn't hurt to obfuscate the addresses when writing them. For example, instead of "user@domain.com", write "user - domain.com".

4. Don't assume that a file is unviewable just because it's in a directory with an index.html file. As my experience showed, that doesn't always work. I don't know why it didn't, but that's beside the point.

How I handled the problem

1. Apologized to the client profusely.

2. Emptied out the contents of the logfile in its old location.

3. Changed the script to start storing the logfile *above* the webspace.

4. Suggested to the client that they notify all users whose email addresses were compromised, explaining the problem, and laying the blame squarely on their web services provider (me).

5. Offered a $1000 guarantee that the new logfile will absolutely not show up in any search engine. Suggested that the client point this out to their customers so they can have some confidence that this was really a one-time screwup and that the client is confident it won't happen again.

6. Canceled the client's most recent invoice.

7. Wrote this post to share my experience, to prevent it from happening to others.

This is especially disheartening because I've spent countless hours fighting and preventing spam for clients and making sure the server and scripts don't get compromised. I haven't used mailto: links in HTML for nearly a decade, always trying new ways to keep addresses out of spambots' hands while making the mailing experience easy for the website visitor -- I even wrote a fairly detailed article years ago on How to Keep Spambots from Stealing Addresses. And now, thanks to my not being careful enough, 357 people are getting more spam. Ugh.

Well, at least I learned my lesson, and I'm writing this in hopes that no one else makes the same mistake.

Spammers somehow found my clients' customer logfile

A cautionary tale -- don't let this happen to you

MichaelBluejay

kaled

phranque

maximillianos

rocknbil

Rosalind

mcneely

vardis

londrum

MichaelBluejay

mcneely

MichaelBluejay

enigma1

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week