Hiding email addresses from harvesters

Forum Moderators: phranque

Message Too Old, No Replies

Hiding email addresses from harvesters

Does using a form to hide email address actually work?

HelenDev

11:14 am on Jul 3, 2007 (gmt 0)

I have seen the technique of using a form to hide an email link. The human user can then click the button to get the email address and mailto link.

I was just wondering how effective this is? I guess this relies on the idea that email harvestering spiders can follow links but they can't press form buttons - is this true?

BeeDeeDubbleU

11:43 am on Jul 3, 2007 (gmt 0)

I have3 never heard of this. Do you mean that the email address is mailed to the user?

Habtom

11:51 am on Jul 3, 2007 (gmt 0)

The human user can then click the button to get the email address and mailto link.

Yes, you mean using JS. If the email harvestering spiders haven't outsmarted the idea yet, it works.

Hab

HelenDev

11:56 am on Jul 3, 2007 (gmt 0)

The one I was looking at actually utilised php rather than js. Basically if the button is clicked, the form posts to the same page, which then reloads showing the email address instead of the button.

Perhaps this is a good solution. I just thought it was unlikely that spiders would be stopped in their tracks by a simple button.

dragsterboy

12:00 pm on Jul 3, 2007 (gmt 0)

right, generally speaking, spiders can not press buttons :)
however they are capable of submitting forms... and I don't know whether your way of stopping spiders would be effective

In order to protect your e-mail address from spider bots - use an image - and you could have your e-mail written on this image... but in this case you would not be able to give a mailto link kind of thing.. or you could use an image + mailto link, which will be generated with the help of javascript ....

piatkow

12:16 pm on Jul 3, 2007 (gmt 0)

Monitoring my inboxes, the address which gets the most spam has never been published on the net but is used for mail outs. Yes, the harversters will grab your address but you will get far more grief by emailing people with infected PCs.

HarryM

12:22 pm on Jul 3, 2007 (gmt 0)

With PHP you can create a page which when requested loads as a contact form (without any email address). When the form is submitted it posts to the same page. This sends the email, but this time when it loads in the browser it is configured as a thankyou message. The only way a spammer can get the address is if they can read the entire PHP source direct from the server.

Marshall

12:24 pm on Jul 3, 2007 (gmt 0)

I read a trick using CSS, but I can not say how effective it is. Basically, you create an image with your email address, or say "Email Us".* Then, when clicked, it opens. Here's the CSS


<style type="text/css"> 
A.email:link { 
} 
A.email:active { 
BACKGROUND : url('mailto:info@yourdomain.com?subject=whatever'); 
} 
</style>

I tried it and it does work. I guess if you robot.txt your CSS as off limits, you're safe.

*Probably best to use your email address in case it does not work. At least the person can read it.

Marshall

HarryM

12:30 pm on Jul 3, 2007 (gmt 0)

Marshall,

I doubt that harvesters take any notice of robots.txt.

Marshall

12:39 pm on Jul 3, 2007 (gmt 0)

True, but would they "think" to look there?

Marhsall

You could put your CSS in the _private folder, as an option.

[edited by: Marshall at 12:40 pm (utc) on July 3, 2007]

thecoalman

12:41 pm on Jul 3, 2007 (gmt 0)

The human user can then click the button to get the email address and mailto link.

Actually this isn't such a bad idea with a minor adjustment if you wanted to provide the real address. Add a captcha to it such as simple question.

HarryM

8:17 pm on Jul 3, 2007 (gmt 0)

Human users are part of the problem. It's not just harvesters who are out to get hold of valid email addresses for dubious perposes. The best technique that I know is to hide the email address completely - see my post about a PHP method above. That - and similar server side coding - is widely used throughout the industry.

If you have PHP on your server you can use a standard script - no experience required.

pixelpusher256

10:22 pm on Jul 3, 2007 (gmt 0)

There is also another trick.

Encode the email address in unicode.
The harvesters can't read uni-code.
But the mail-to link will still work.

Ex:
joe@widgets.com

Also you if you have separate contact page, use a robots.txt file not to crawl the contact page.

Ex:
User-agent: *
Disallow: /contact.html

pixelpusher256

[edited by: tedster at 3:56 am (utc) on July 4, 2007]
[edit reason] line breaks added to prevent side-scroll [/edit]

bill

2:57 am on Jul 4, 2007 (gmt 0)

The harvesters can't read uni-code.

Unfortunately that trick hasn't worked for years. Today's harvesters can read Unicode. I ran tests tears ago and the Unicode encoded addresses were scraped and spammed within hours of being published.

You're safest with a JavaScript obfuscation these days if you have to put your address out there.

If you can get your customers to use contact forms that's safest of all.

kaled

9:51 am on Jul 4, 2007 (gmt 0)

Use images to display the email address and use obfuscated javascript to activate them as links. If you link the images with simply href="mailto:" that is easier for users without javascript but it does identify the images as email links so it is conceivable (but unlikely) that the email address will acquired using OCR methods.

Kaled.

jomaxx

6:15 am on Jul 5, 2007 (gmt 0)

Probably the best bet is to use Javascript with a minor tweak (e.g. concatenating two or three variables) -- not a cut-and-pasted snippet but one that you have customized for your own site to some degree. Spiders ARE submitting forms (to some degree, at least), but I'm not aware of any that actually execute Javascript.

DanA

6:34 am on Jul 5, 2007 (gmt 0)

Look for an article entitled "Effective methods to protect email addresses from spammers" (updated in may 2007) which compares the different techniques and shows that the only technique that was still efficient in May was splitting an address on two lines...
There may be interesting ideas there.

incrediBILL

4:39 pm on Jul 6, 2007 (gmt 0)

The CSS trick above is cute but only gives you security by obscurity because most spam harvesters don't read all of the files included with the page. However, if many of you do this, and now it's obviously out in public, they may start reading those CSS files if they think there are email address in them.

The only way to keep from getting an email address spammed is to not post your email address in the first place and use a contact form instead. Unfortunately, there are automated scripts out there hunting down those contact form pages and spamming them so don't forget to include a captcha on the contact form page to block those spammers as well.

Spam spam spam spam...

jtara

6:05 pm on Jul 6, 2007 (gmt 0)

The most effective means would be not to hide the email address at all - but to generate a one-time address, or at least an address tied to the specific user or cookie.

This enables you to shut-off addresses that receive spam.

There are commercial services that do this, which has the advantage of directing the email away from your domain. When you "shut off" an address, it's THEIR server that gets hit with the overhead of rejecting the mail.

I don't know how practical the commercial services would be for this, as they have various service levels depending on the number of disposable addresses you need. It may not be practical for many thousands of disposable addresses.

ebound

6:11 pm on Jul 6, 2007 (gmt 0)

I recently went strictly contact forms. Instead of captchas I started using a method I found on the web that I had not seen before.

I place a text field on the form that is actually hidden from real users. On submission of the form if anything is entered into the textbox then I know it's spam.

This method has worked great so far.

pageoneresults

6:19 pm on Jul 6, 2007 (gmt 0)

If I'm forced to use an email address, there has usually been a concession at which point I put email addresses into an unlinked image format.

My normal procedure is to utilize a form for all communications, works like a charm. And, now that I'm working with ASP.NET forms a little more, my programmers tell me that there are built in measures to prevent the bots from spamming the forms. I'm learning more about them now.

Marshall

6:36 pm on Jul 6, 2007 (gmt 0)

I place a text field on the form that is actually hidden from real users. On submission of the form if anything is entered into the textbox then I know it's spam.

using this idea, could you not use it in combination with a sscript that if anything is in the field it won't submit?

function reset() {
document.emailform.validate.value="";}

// field validation - checks if fields are blank.

function checkFields() {
if ((document.emailform.validate.value!="") )
{
alert("Sorry, form does not validate.");
return false;
}
</script>

Just an idea.

Marshall

WiseWebDude

6:42 pm on Jul 6, 2007 (gmt 0)

I've seen a lot of webmasters now using:

me [at] example.com

which should stop the spambots AND force those who are lazy and want to waste your time NOT to contact you, either because they aren't smart enough to figure that out, or they are too lazy to copy/paste and fix it right in their e-mail client.

g1smd

6:45 pm on Jul 6, 2007 (gmt 0)

Once there are millions of addresses "hidden" using one method, spammers adjust their scripts to also pick those up. So the [at] trick may not work for very long.

kapow

7:09 pm on Jul 6, 2007 (gmt 0)

We build and manage business websites. Most of our clients require an email address (so the form-only method doesn't work for us). For about 4 years we have use the encrypted javascript method - and it has never been spammed yet :) There are many different ways to encrypt the address with javascript, one of my favourites is this one: [hivelogic.com...]

For forms we have setup a 3-way robot detector to reject form-injection spam. Basically it means we don't need to use irritating Captchas:
1.) Use the css hidden text field (mentioned above): If it is completed reject the form.
2.) If any of the following appear in inappropriate fields, reject the form: 'http', 'www', '[', '@' (e.g. '@' is allowed in the email field but no other).
3.) A hidden field with a simple code (e.g. xyz). If the field does not contain this value: reject the form.

So far 100% effective :)

incrediBILL

7:34 pm on Jul 6, 2007 (gmt 0)

I didn't specify squiggly text did I?

Instead of captchas I started using a method I found on the web that I had not seen before.
I place a text field on the form that is actually hidden from real users. On submission of the form if anything is entered into the textbox then I know it's spam.

and...

For forms we have setup a 3-way robot detector to reject form-injection spam. Basically it means we don't need to use irritating Captchas:

Time to do my civic duty with a PSA about Captcha's and shed a few myths just posted.

The word CAPTCHA stands for "Completely Automated Public Turing Test to Tell Computers and Humans Apart" so anything you do that creates a TEST, and tests can come in many forms, which allows humans to pass but blocks a computer that can't interpret the problem is a form of CAPTCHA.

So you're all using CAPTCHAs, enjoy! :)

hutcheson

7:54 pm on Jul 6, 2007 (gmt 0)

The hivelogic code is interesting--but massive overkill, I think. Either the harvestbots can execute javascript (and it doesn't matter how complex the code is) or they can't (and it doesn't matter how complex the code is.)

And humans can get their browser to execute javascript -- so it doesn't matter how complex the code is.

I'm not directly concerned about human harvesters -- my material is far enough down the popularity scale (I'd feel differently if I were trying to protect a website like, say, the ODP or Wikipedia.) But as it is, my only concern is the spiders. (This isn't what you'd do if you're trying to hide your email address from humans, which is what some of the other proposals address.)

Now, assuming spiders can execute Javascript but can't click buttons, all you need is a button that executes a (trivial) Javascript function that scans your own page for links to a fixed address (say, nospam.htm) and replaces those links by appropriate "mailto" links.

If you keep that Javascript function in a separate file, and you have just enough obfuscation for obvious regular expressions to not work (that is, no occurrences of "email" or ".com" or "@*.com" in strings), then no conceivable bot is going to be analyzing your script, because there's no automatically detectable traces of the presence of e-mail links. (There ARE no e-mail links until the button is clicked, and there is nothing suggesting that the "button" function should be executed.)

As for hiding "visible" links from robots but making them easily readable by humans, I'd suggest putting the different parts of the e-mail address in different cells of a table; and using a subscripted middot instead of a period. No robot is going to parse out table cells with "rowspan" and "colspan" attributes to see which un-email-looking fragments happen to line up visually on the screen.

Here's my external JS file:
---------------------------------
// This is simply an address spelt backwards.
var revmail='moc.elpmax' + "e@sser" + 'dda';

// This is simply any portion of a link on an html page.
var replaceme = "nospamhere.html"

//This is simply a reverse-string function
function revert(a) {
var z = ""
var i = a.length;

for (i=a.length; i>=0; i--) z = z + a.charAt(i)
return z; }

//This is the "button" function which finds the links to be replaced
//with actual references to the m.a.i.l_t.o protocol.
function alertml() { // figure this out, spambot!
var i=0;

for (i=document.links.length-1; 0<=i; i--) {
if (document.links[i].href.indexOf(replaceme) >= 0) {
document.links[i].href = revert(revmail + ":otliam");
} } }
------------------------------------------------------

ebound

9:01 pm on Jul 6, 2007 (gmt 0)

and...

and what?

superclown2

9:41 pm on Jul 6, 2007 (gmt 0)

I'm in the UK and all my business comes from the UK. I used IPtables to block all of APNIC, AFRINIC, most of ARIN (particularly certain major service providers and all USA universities), Germany, Israel and Rumania for the simple and logical reason that these are the areas that caused me most of the problems. The result was a 95% reduction in spam, virus attacks, and break-in attempts. I complain to every ISP that I get spam through and if I don't get a result I block them. My three servers and hundreds of domains now receive perhaps three or four spam messages a month.

We will continue to get spam until service providers are persuaded to stop it at source. If every server owner adopted a no-tolerance policy this would come about very quickly.

HawksM

10:42 pm on Jul 6, 2007 (gmt 0)

I use this solution on my site. [w2.syronex.com...]
Works pretty good so far. The idea is to hide any new email address as best you can to cut down on spam.

BTW - first post, new reader, looking forward to pubcon.

-Mark

[edited by: tedster at 1:06 am (utc) on July 7, 2007]
[edit reason] make link live [/edit]

This 69 message thread spans 3 pages: 69