Forum Moderators: open
I've seen a good reduction in spam after encoding my mailto's in this manner. Spam got so bad for me on the address I'd published on my website since '97, that I had to abandon it about 6 months ago. So using a "fresh" address and using encoding, I'm almost spam free.
For now.
The fun is back on when the bulk email harvesters build in routines to decode this... just as the emailme@NOSPAMmydomain.com trick doesn't work (much) anymore.
And then there are people who want to spellcheck their mail. Or who want to be offline when they type.
Mailto with @ is always the way forward. Maybe you could filter and require a certain keyword in the subject?
One method I use to display is to encode all the code for a mailto link, and then have the text as 'info at mywebsite dot com'. Of course some users will miss it, but I would honestly rather get less spam and miss out on a couple of emails about something random! ;-)
Also, I have found that forms are a good way to go, as long as they are easy to fill out. As a browser I would rather fill out a form than write a message in a seperate app and then post it - takes less time with a form.
Unfortunately obfuscation has been well known for a few years now. Spammers are already decoding it.
If your site isn't on any major pathways that a spam bot is likely to travel then you can obfuscate and call it good.
If you have a popular, well indexed/linked site then the simple truth is that you can't use mailto links anymore, you have to use forms. If you don't you'll be getting 500 extra spam emails a day within 6 months from now and it will be too late to do anything about it... And that's if your site is only moderately well traveled.
But I prefer using a form to get prequalification questions answered if there is a choice. I'd rather get a typo in the email address than get a freeform email. I also ask for a phone number and other details so that contact can still be made.
All email addresses that I did not protect in this way currently get spam. I first started doing this about 8 months ago and am still waiting to get any spam on any email I have up on the web.
I think the key here is that spiders don't request library files in general, since it would slow them down too much, they also don't execute Javascript, for the same reason.
The only drawback to the external JS method is that people with JS turned off (or with JS incompatible browsers) can't email you.
Might seem like a small thing until you start to think about it. Who doesn't have full JS support? The palm/handheld/cellphone world! ;-)
I don't like the someone.nospam@domain.nospam.com type links as they inconvenience the user who has to edit the address (to delete the "nospam" bits) before sending the email out.
I don't trust obfusciation, for two reasons:
- some spammers are already decoding the well- known ones, so they are becoming less effective.
- some sites offer to obfusciate an address for you. That site might be run by a spammer who collects the email addresses from all the people he supplies the new code to!
I usually supply a form as well as the (javascript made) clickable link, as some people prefer to fill in forms.
Despite the above, in the <noscript> part I still usually use some "part English" someone AT somewhere DOT com type address which is NOT clickable so that people without javascript can still do something.
Generally I just do this where it's an email address that's used for direct business purposes, signups etc.
I only do this for commercial clients, my own stuff I just leave the javascript since some of my personal sites depend on javascript for their functionality anyway so they can't be viewed without it, that's to learn, I know it's not a good idea, I don't do it commercially.
Other variants can be like this:
<noscript>
contact us here, just substitute the 'at' with the @ character
name at company dot com
</noscript>
there's really no way a spider can be made smart enough to read and understand english without massively slowing it down, also, since relatively few people do this, email spiders will probably never have any problem collecting a good supply of email addresses from the web in general, so it's not really worth their while to improve this, that would be my guess anyway.
For the main contact address, I often surround a text image with an anchor tag created by two document.write() methods, one to open the element and one to close. That gives the non-js user agent a way to see the address.
The clean inboxes have been wonderful. However, if spambots start using OCR, then the image approach may have to go. So far, so good however.
<style type="text/css">
.del {
display: none;
}
</style>
<script type="text/javascript">
foo = "name";
bar = "\x40";
baz = "domain";
dotcom = "com";
document.write("<a href=\"mailto:" + foo + bar + baz + "." + dotcom + "\">" + foo + bar + baz + "." + dotcom + "</a>");
</script>
<noscript>name<del class="del">DELETETHIS</del>@domain<del class="del">DELETETHIS</del>.com</noscript>
JavaScript enabled browsers will get a clickable mailto link. Users will JS disabled should see the e-mail address in plain text (provided the browser supports basic CSS). Browsers that do not support JS, nor CSS, will see the ugly DELETETHIS style address.
if spambots start using OCR
I have to believe this would be way too slow and processor intensive to be practical, at least for another generation or two of PCs. With so many text e-mails out there for easy collection, the incremental benefit of trying to interpret images would be greatly outweighed by the time needed. At the moment, you could analyze gigabytes of images without finding a single address. Humans could probably screen images more quickly.
I've done the image thing, too, with the image matching the font and colors exactly. The casual visitor would never realize that they were seeing an image unless they try to highlight it as text.
These could still present problems for some users. I'm also big on the form option, as some users surf from PCs without a mail client to handle the mailto: - shared PCs, web mail users, etc.
But they all have one thing in common, they can be defeated.
Just FYI, it would take me a couple of days to write and bug test a script that would harvest the email addresses protected by all of the methods mentioned in this thread (and a variety of other methods as well). It would run on a single processor server and could handle 10's of 1000's of sites per day, if not a lot more.
The only exception is OCR where there is no indication that an image contains an email address. If you want to parse every image on a site for email addys then you're going to be slowed down to, probably, 1000's of sites a day unless you have a huge bandwidth pipe. Processor power isn't an issue though.
Fortunately spammers are frequently stupid and rarely good programmers so a lot of the methods mentioned in this thread will work, and probably keep working for a long time.
Still, there's no way to harvest an email addy from a web form :-)
Just FYI, it would take me a couple of days to write and bug test a script that would harvest the email addresses protected by all of the methods mentioned in this thread (and a variety of other methods as well). It would run on a single processor server and could handle 10's of 1000's of sites per day, if not a lot more.
I beg to differ ;)
It would be a lot faster (and easier) to grab the e-mail addresses manually. Yes, it's easy to write a script that can extract an e-mail address -- but only if you know how the address has been hidden/obfuscated/garbled. E-mail harvesters can impossibly know how any given site has chosen to hide the e-mail address. What makes it even harder is deciphering JavaScripts. What if the username and domain name are in random orders? What if the JS is served by a server side script with random variable names each time? What if the username and domain name are split up in even smaller pieces? There's no way you could write a script that can take all these factors (and many more) into consideration -- at least not in a way that would make it worthwhile. You'd end up spending more time creating the script than it would take you to grab those few addresses manually.
That nonwithstanding, someone has done it already Doc, it's called a Web Browser.
All you have to know about javascript is what the final output is intended to be and that's not so hard.
That nonwithstanding, someone has done it already Doc, it's called a Web Browser.
Which contains millions of lines of code. Written by huge teams of people. And some of which still don't execute javascript reliably, such as safari. From what I've seen getting a browser to fully support dom 1 is not 'easy' at all, that's why neither Opera or Safari have that down yet, and I don't think that their programmers are lacking in skill.
Forgive me for being skeptical here about IanKelley's claim. I'm not going to worry about claims like this personally, it's easy to say you could do something if you felt like it, that's not a very interesting thing to read to me, what would be more interesting would be to read: oh, I've written that, it was easy, and it works, processes x pages per hour etc.
Given that there is a fair amount of money in the spamming world, and that probably they might have access to good programmers happy to get some of this money, I kind of suspect this might not be quite as easy to do as is here being claimed.
Or maybe it is, who knows? I'll wait and see, when I get my first spam using these methods I'll change the methods, no big deal.
Yes, of course browsers are large software programs. However, a specific routine to parse only a small section of javascript's functionality doesn't take all that much code. You don't have to worry about supporting DOM or parsing the entire JS language.
It doesn't need to be done to see that it's possible. It's very common sense. In much the same way that a contractor could reasonably say he could build another skyscraper after having built them in the past.
I only posted to let people know that the possibility was there.
If my word doesn't do anything for you, consider that the IE shell is freely usable by other software. It would be incredibly easy to run a webpage through IE using windows software and then parse the outputed page (with javascript now parsed and the real email addys displayed by IE's aforementioned million lines of code) for spamable addresses.
One of the reasons it works for now is that the great majority of scrape-able addresses on the web are just hanging their in ordinary HTML, easy picking. Spam addresses are a volume business, so why go to lengths? If that situation shifts, then spambots will shift to match, just as email spammers are shifting their tactics to penetrate the new anti-spam filters.
After almost two years of js protection in one case, the biggest problem has been someone using their protected email address in an unprotected fashion somewhere else on the web.