Protecting Email Addresses From Harvesters

Forum Moderators: DixonJones

Message Too Old, No Replies

Protecting Email Addresses From Harvesters

shug

4:27 pm on Dec 14, 2005 (gmt 0)

Hello Guys,

Probably like a lot of people I don’t know much about robots and spiders etc. So hopefully somebody can help me a little here.
I have a website, non-commercial, and I left a blank robot file on the server as suggested on one of the threads on this site. I wanted to encourage visitors - no problem there. I am now wondering about the best way to protect email addresses that are posted on my Guest Book as I assume that the Guest Book will be spidered with the site and those that leave there mail address will get spammed. Don’t want that!
Any ideas gratefully accepted.
Regards

Hugh

IanKelley

2:54 am on Dec 25, 2005 (gmt 0)

I disagree that it's a wasted effort. As webmasters the fact that spam is pervasive is not a good reason to make spammers jobs easier.

As an example, it's futile for sites that provide online access to whois information to add turing validation to the process because the bots can get access to the same information elsewhere. Nevertheless I'm glad they do it.

g1smd

10:23 pm on Dec 27, 2005 (gmt 0)

Over the last three years I have helped my friends to almost eliminate email spam that they were receiving, using some very simple methods.

Bulk email spam comes from only a few different types of source. There are people that buy email harvesting programs that crawl the web and store anything found in the someone@somewhere format, or anything that starts with mailto:. If you remove your email address from the web in machine readable format, then you will be immediately invisible to the next crawler that visits your site.

Another source of spam is from people that buy the "20 million email addresses on a CD" products that are widely advertised, and I estimate that most of those to have a life of ~6 months before people upgrade to a new version of the CD.

Lastly, there are the people that ask a mail-spamming agency to do the dirty for them; often from China. Again, these agencies use multiple sources of information and largely retire old data within a year or so.

However, it is worth noting that the spam that is sent out often contains tracking codes to verify that the email was read, and once you have unwittingly confirmed that the address is live, and mail to it gets read, then it is much harder to get off all of the spammers lists.

Many spiders and bots visit your site every day, looking for email addresses to add to their databases.

If you URL encode your email address like mailto:%40%46%53%74%47%40%63%84%82%54%47%43%74%95 then it still isn't safe from being harvested (as it is easy to decode).

Likewise some people add redudant words to the email address, like jon.nospam.doe@somewhere.nospam.com, and this does not help much at all either. Ater running the email address collecting program it is a very simple thing to remove the word "nospam" from every email address collected, and to reduce any instance of more than one adjacent dot back to a single dot.

There were two main methods that I chose to use.

The first was to write the block of code that shows the email address as a clickable link not in standard HTML but instead using an external javascript routine. This produces in a browser a link that looks like a link, and clicks like a link, but when you parse either the HTML code or the Javascript code there is no email address to be seen. The email link is built from code fragments that are added together by a document.write instruction. This only works if javascript is enabled, so I always do the next paragraph too.

The other method was to employ an email form with the email address hard encoded in the server side code, and NOT appearing anywhere within the HTML code that the browser (or bot) is presented with. This works for all visual browsers and is very safe. These two things when used together mean that every real visitor to the site has at least one way to contactyou, while the bots and harvesters see nothing that they can use at all.

Another method is to present a non-clickable image that contains the email address for the viewer to read and then type into their email program. I didn't use that, but again it is very effective.

Additionally a Google search was made to find every page on every site that already mentioned the email address. We wrote to every one again and again to get all traces of every email address removed and replaced with a simple link to the "secure" email form page.

Using these techniques the results have been good. This is what has been achieved so far, for four people that I checked back with a few months ago:

From 1000 spams per day down to under 400 per week.

From 100 spams per day down to about 20 per week.

From 50 spams per week, down to about 10 per month.

From 100 spams per day, down to about 10 per day.

For all of those people that is the raw number that arrive at their account.

Some of them use spam filtering to reduce the number seen even more so.

The person who gets 400 spams per week has a filter (Freeserve) that sends more than 380 spams per week straight to the trash can. So two years ago he had over 7000 spams in the visible inbox per week, and now it is less than 20.

Job Done.

Or so I thought.

A few months ago, the amount of spam started rising again for one person, and a month later another reported the same sort of thing. A Google (and Yahoo, and MSN) search failed to find a simple reason for this. I assumed that each had opened a spam message that contained a hidden "web bug": a specially coded link within the email that calls for an image from the spammers server, the image having a unique name like tracking.5037A34F1C76.jpg where the unique ID is known to have been used only in the email that was sent to YOUR email address. Once your email program requests the image, the spammers know that you have opened the email and hence send you more spam.

Last week another friend reported a sudden sharp increase in spam received: 80 per day for the last 6 months, then a sudden jump to over 200 per day - and I know he reads all email offline (so no way was a "web bug" involved here). A quick Google search soon found a new site that was promoting a friends email address, the first one to have had trouble, but didn't find anything for the other two.

A few days later, the second email address was found in a Google search, and yes, it pointed to the same site as the first person's search did.

I had thought that all my friends email addresses were gone from the web, and they were if you just considered normal websites, forums, and blogs. There is just one place that the email addresses had remained in view: Google is treating some of the pages where the email address has been removed as a "supplemental result" and still shows the page in the results when you search for the email address, or some other redundant terms. Google also shows the email address in the snippet, even though the email address isn't on the real page, nor does it show in the Google cache of the page either. This left them open to be harvested by a small number of people that might find a way of scanning Google results.

There is a Chinese spammer that wants to tell the world about some "investment secrets". She has written to many tens of thouands of people, starting in 2005 August, and has published all of the email addresses of the people that she has spammed on some forum or blog. Now that the email addresses are visible back on the web, all of the other bots and harvesters have been busy collecting them from that site. I believe she originally got at least some of the email addresses that she is now spamming by searching Google using some sort of harvesting program.

After collecting these email addresses she has been very busy sending her "investment message" out. So far, there are about 120 pages of email addresses published on her forum, with about 18 000 addresses per page. Yes, that is more than 2 million addresses that she has spammed. However, more importantly that is 2 million email addresses that are now receiving a ton of junk thanks to the thoughtless actions of one individual in publishing the email addresses back on to the web. They are now being rapidly harvested by all of the other spammers on the planet, and are being added back to all the spammers email lists that we managed to get them removed from only last year.

Someone else has already left a message on that board asking for the email addresses to be removed from public view and was told by the site owner that the request was being ignored. I don't think they understand about bots and harvesting.

<Edit>I had to remove this section from the post - it got very specific and although the post in general is blowing my mind, I am pretty sure that we can't link to pages that supposedly identify spammers</EDIT>

Umm, we aren't asking you to stop spamming (as your measly one or two messages aren't actually making a big difference in anything at all) but we ARE asking you to remove the email addresses from your SITE so that OTHER spammers no longer have access to them... bozo.

Has anyone got any other suggestions?

Any Chinese speakers wanna post on their forum and ed-u-cate them?

[edited by: Receptional at 2:14 pm (utc) on Jan. 16, 2006]
[edit reason] Specifics removed. Sorry if this makes reading difficult [/edit]

kapow

4:17 pm on Dec 29, 2005 (gmt 0)

Re. Using a form instead of an email address. In the last 5 months we have seen a new type of spambot that seems to be using forms on the sites we manage!

We manage over 100 busy websites for different businesses and have never seen spam to addresses encrypted with the hiveware encoder and similar tools.

Robert Charlton

7:56 pm on Jan 16, 2006 (gmt 0)

I thought I'd bump this thread up since it was put on hold for a few weeks. Great info. Thanks.

tombola

9:30 am on Jan 17, 2006 (gmt 0)

In the last 5 months we have seen a new type of spambot that seems to be using forms on the sites we manage!

To prevent this, you can add a captcha to the form.

Robert Charlton

7:04 am on Jan 18, 2006 (gmt 0)

To prevent this, you can add a captcha to the form.

What's a "captcha?"

tombola

10:49 am on Jan 18, 2006 (gmt 0)

"CAPTCHA" stands for "Completely Automated Public Turing Test to Tell Computers and Humans Apart".

It's a deformed picture of a generated random text which can only be read by humans. To get access to our members area, users must not only type their username/password, but they must also retype the displayed captcha.
Using a captcha is a good way to protect your server against automated attacks.

You'll find a good example of a captcha on the add url page of Google:
[google.com...]

More information: [captcha.net...]

tomda

12:23 pm on Jan 18, 2006 (gmt 0)

Because it is on-topic:

Anyone has flagged a topic in WW having a PHP script to change email address to image (using GD) on the fly...

I have tried to do mine but failed because header() must be sent before session start()...

Thanks

Barb

3:36 am on Jan 25, 2006 (gmt 0)

This thread has given me some ideas with my email spam problems.

I have two personal websites - both of the same topic. The second one was on the web live just one day and got LOADS of spam emails, mostly Nigerian type letters. Now I get "You have won the UK Lottery" and Pharmacutical ones.

Since my sites have loads of pictures, a good amount are banners (exchanges and such) - so it would be a site that visitors would want to see pictures of various kinds and would have their viewers enabled for such. So, making my email address into images is a do-able option that may help alleviate my spam situation.

Are there any non-script, simple solution for non-images sites (or limited images sites) for novices like myself? (somewhere along the example of: joe.shmoe(at)email.org)? I am trying to keep my sites relatively simple script-wise. My sites are not flashy, wanting the content to speak for themselves.

Thanks for everyone contributing to the solution to this epidemic!

This 39 message thread spans 2 pages: 39