homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

This 39 message thread spans 2 pages: 39 ( [1] 2 > >     
Protecting Email Addresses From Harvesters

 4:27 pm on Dec 14, 2005 (gmt 0)

Hello Guys,

Probably like a lot of people I donít know much about robots and spiders etc. So hopefully somebody can help me a little here.
I have a website, non-commercial, and I left a blank robot file on the server as suggested on one of the threads on this site. I wanted to encourage visitors - no problem there. I am now wondering about the best way to protect email addresses that are posted on my Guest Book as I assume that the Guest Book will be spidered with the site and those that leave there mail address will get spammed. Donít want that!
Any ideas gratefully accepted.




 8:18 pm on Dec 14, 2005 (gmt 0)

Why would you want people to post their e-mail addresses to your guest book?

(Of course, you could always put the guest book in its own folder and prohibit this folder in the robots.txt file, but the spam-bots won't respect that, so this wouldn't accomplish anything useful, in the long term.)

Wouldn't it be better to admonish them not to include that information? Or am I misunderstanding what you're trying to do...?



 10:17 pm on Dec 14, 2005 (gmt 0)

Thanks for your reply.

Those that leave an entry in my Guest Book can leave their name, where they come from, and their email address if they so wish (optional). I respond to every entry in my Guest Book if they leave their address. I just like to thank people who take the time to read and comment on the site.

As I have become more web aware, I realise that those bots crawl my site. I am also aware that those email addresses are then probably spammed. I just donít like the fact that people leave their contact address on my site and are maybe not aware that robots crawl, pick up the addresses and their inboxes get full.

If thatís the only way to do it then will disable that function.

Thanks again.


 8:46 am on Dec 15, 2005 (gmt 0)


There are numerous solutions to this problem.

1. Change your site so that the email addresses are displayed as graphics that can't be picked up by spam-bots. You will need some software to do that.

2. Change the site so that the email addresses are modified in such a way as to make it obvious to humans what the email address should be, but, hard for bots to see......

i.e. John&johnsmith.com is displayed as:
"no-spam-john@johnsmith.com"......you get the idea?

3. Install a form to email your true consumers. The form will hide the true email address from bots, but, will allow humans to communicate with valid human contacts.

There are many other solutions, these are just 3 simple ones :)


 9:04 am on Dec 15, 2005 (gmt 0)

i am assuming you are a relatively new webmaster using an off the shelf guestbook.

as percentages said there are many solutions to your problem - but all require a degree of programming knowledge.

your only real easy solution is to turn off the 'add email' function.

or what you could do if you have a 'form mail' script is get the visitors to send their guest book entry to you using a form and then you enter the details into your guestbook script manually for them - also don't have any links to the 'add guestbook entry' - a bit long winded but could work.


 10:42 am on Dec 15, 2005 (gmt 0)


Thank you both for your replies. They are much appreciated.

I am a new webmaster I have learnt xhtml and css and have only the slightest knowledge of php. Not enough to even attempt to do my own Guest Book. So It's off the shelf. I like the idea of a form so that I can manually input myself but will have to learn how to do that. But needs must.

I will think about either changing or disabling the email input for the time being.

Thanks again


 1:58 am on Dec 19, 2005 (gmt 0)

I write all e-mail addresses to the web page using JavaScript, including those in my Guest Book.
I also give the user the option not to display the e-mail address at all (however I can still see it in the database).
For some of my own addresses I have an image in the noscript block, so that those not running JavaScript can still see my e-mail addresses (but bots can't).


 9:39 am on Dec 19, 2005 (gmt 0)


Thanks for that info. The problem I have with that is my GuestBook is off the shelf - not written by me. I am not that familiar with Javascript and would not know where to begin.



 5:02 pm on Dec 21, 2005 (gmt 0)

On my e-commerce site I have a contact form, but i also display the email address as a .jpeg image. Other than the image, there is no email address displayed.

On my other site, which is a hobby site, people request their contact details to be put on the site. I always change their email address from:

someone@domain.com to someone(AT)domain.com



 5:26 pm on Dec 21, 2005 (gmt 0)

Thanks Tim. Changing the email address manually is what I think I will do until I get more clued up.

Much appreciated


 6:17 pm on Dec 21, 2005 (gmt 0)

I made a test, with one of my domains, I registred it With a private ("unlisted") registration and I maybe get one spam email a month.


 6:58 pm on Dec 21, 2005 (gmt 0)

See our main website, the url in profile. Look at the source. The email link seen at the bottom of the page in red, is in the script at the top of the page source.

I also use this to hide internal links on my large sites, so they don't delute the keywords on the page.

This particular site, I don't use it for the navigation, because it would be negative to do so. There are the right times and the wrong times to use tools. You have learn that.


 7:49 pm on Dec 21, 2005 (gmt 0)

I think people that post in Guest books and leave their email address do it all over the web and they are aware of email harvesting or don't care.


 7:49 pm on Dec 21, 2005 (gmt 0)

Don't use images. That immediately excludes everyone with screen readers.

The hiveware encoder doesn't have a fallback for folks with JavaScript disabled.

I have a custom routine I use that encodes the email address down. So, <a href="mailto:email@example.com">email@example.com</a> becomes:

<script language="Javascript" type="text/javascript">
document.write('<a href="mai');
// -->
</script><noscript>&#101;&#109;&#97;&#105;&#108; at
&#101;&#120;&#97;&#109;&#112;&#108;&#101; dot &#99;&#111;&#109;</noscript>

To most end-users -- including those with screen readers -- it is a standard clickable email address. To those with JavaScript disabled, it appears as "email at example dot com" which is easily user-interpretable.

The function and the code are freely available under a Creative Commons license. I cannot post it here, however, as the license links back to my personal homepage.

I have heavily trafficked sites that use this encoding on email addresses that don't get any spam.

One side note: it's always a good idea to have the email address by copyable and shown as the actual address rather than something like "click here to email us". Lots of people use webmail and can't click on mailto: links as there is no default mail client configured on their PC. (and yes there are systray tools that allow you to enable this, but hardly anyone that uses webmail has them installed.)


 9:30 pm on Dec 21, 2005 (gmt 0)

Some interesting replies there.

The javascript option: I actually use that in my main page already but I am unsure if I can do the same with the Guest Book. My Guest Book is off the shelp PHP script.

Unless something like that will work in the PHP script I will have to alter the email addresses manually until I come up with something better.

Thanks for all your replies.


 10:03 pm on Dec 21, 2005 (gmt 0)

Is email address "munging" still effective? If I was writing an email harvesting program I'd look out for strings that resembled:

username at domain dot com

and the other popular permutations.
I'd imagine you have to be a little inventive and come up with a new one for your site for it to be completely effective. Still doesn't get rid of the annoyance of not being able to click directly on the proper address.


 12:29 am on Dec 22, 2005 (gmt 0)

I do something similar to Critter without the character translation, i.e., assemble the address from pieces using javascript, and offer a NOSCRIPT alternative that isn't directly readable as an email address.

None of these are perfect, but they'll defeat most spambots. It's been noted before, but I'll say it again - if a browser can decode your stuff, so can properly designed spiders.

Here are a couple of totally different approaches:

1) Replace the guest book with a comment form - guest books are soooo 1997, and are magnets for unsophisticated spammers.

2) If you must have a guestbook, modify its code to print two pages - a hidden one not linked from the site with no emails that you can read, and a public one that prints the same comments minus emails. The coding for this is likely to be simpler than other alternatives.


 2:01 am on Dec 22, 2005 (gmt 0)

There are email harvesting robots that can read javascript. It's not the norm but it will be eventually.

It would be simple, even for a PHP beginner, to remove email address display from the guestbook. You would still be able to see the email addresses yourself in the database (or file).

Email addresses should never be displayed publicly, use contact forms instead.


 3:58 am on Dec 22, 2005 (gmt 0)

>>> There are email harvesting robots that can read javascript. <<<

Yup! We've been down all these paths and the spam bots always found their way around our methods. Only the image option is totally effective as there's nothing there for a bot to see, but this is not very convenient for users.

We're currently using Unicode and have been pretty much spam-free for six months, even though the spam bots continue to regularly hit our site. This displays as a normal email link for users, can be read by screen readers and is compatible with all current browsers -- including those with JavaScript disabled.

If you're interested, just search for ďascii to unicode converterĒ on Google and select the first listing. Use this tool to convert your HREF value, including mailto: (everything inside the quotes, but leave the ďquotesĒ in place) and replace the ASCII text in your HTML with the Unicode output. If you want your actual email address displayed as the link, do the same for the link text.

If you're using a WYSIWYG editor, you may need to do this in a text editor, as it will probably convert it back to ASCII when you save the file. FrontPage users (or CGI form users, where the recipient email is included as a hidden field) can also convert email addresses used in their forms -- again, with a text editor.

There are bound to be those that follow this post stating that this can also be harvested, but if you think about it logically, itís a huge and very slow task to scan pages in Unicode, which is probably why malicious developers havenít taken this approach yet. Iím not saying itís impossible; just not very practical -- or necessary with so many easier methods.

I hope this helps ;)


 5:20 am on Dec 22, 2005 (gmt 0)

None of these are perfect, but they'll defeat most spambots. It's been noted before, but I'll say it again - if a browser can decode your stuff, so can properly designed spiders.

Well, this is true. BUT... once you start doing things in a sufficiently complicated manner, you have effectively turned it into the halting problem.

Also keep in mind that if you are doing custom 'protection' system, you definitely have diminishing returns for spammers.


 5:23 am on Dec 22, 2005 (gmt 0)

Or, you can always do something like this:

function email($username, $domain, $tld) { 
$uservar = substr(md5(microtime()), 0, 13);
$atvar = substr(md5(microtime()), 0, 13);
$domainvar = substr(md5(microtime()), 0, 13);
$dotcomvar = substr(md5(microtime()), 0, 13);
echo "<script type=\"text/javascript\">
gx$domainvar = \"$domain\";
gx$uservar = \"$username\";
gx$dotcomvar = \"$tld\";
gx$atvar = \"\\x40\";
document.write(\"\\x3c\" + \"a href='ma\" + \"ilto:\" + gx$uservar + gx$atvar + gx$domainvar + \".\" + gx$dotcomvar + \"'\\x3e\" +
gx$uservar + gx$atvar + gx$domainvar + \".\" + gx$dotcomvar + \"\\x3c/a\\x3e\");
<noscript><div>$username<del class=\"del\">DELETETHIS</del>&#64;$domain<del class=\"del\">DELETETHIS</del>.$tld</div></noscript>";

Then, of course, specify display: none in your stylesheet for the .del class

[edited by: engine at 3:10 pm (utc) on Jan. 16, 2006]
[edit reason] fixed scrolling [/edit]

Robert Charlton

 10:08 am on Dec 22, 2005 (gmt 0)

The hiveware encoder doesn't have a fallback for folks with JavaScript disabled.

I've used the Hiveware Enkoder with a gif version of the address in <noscript>. Hadn't thought about screen readers. Possibly could also use an alt tag on the gif with "email name at domain dot com." That would certainly cover all bases. I will say that the Hiveware Enkoder has certainly cut down on spam... in fact more or less eliminated it... wherever I've used it.


 5:35 pm on Dec 22, 2005 (gmt 0)

Thanks for all your help


 6:35 pm on Dec 22, 2005 (gmt 0)

This won't work for your guestbook email addresses, but I have a law firm as a client and the attorneys signed up for an online service called CipherSend a few weeks ago which allowed them to replace their website email addresses with a button that their site visitors can click to send files and messages securely to their email address without a password. I don't recall it being real expensive and it definitely got their email addresses off their site!


 7:55 am on Dec 23, 2005 (gmt 0)

This has been my experience...
Please read through, the bad news comes first but it's not all bad.

First of all, nothing will stop the spambots from harvesting email addresses. Today's harvesters use sophisticated software, some with OCR-readers (Optical Character Recognition) that read image-based addresses, and url and script de-obfuscation that render even expert coding tricks useless to all but the simpler bots. Reason I know this is I got bots probing inside my cgi bin and penetrating the perl script used for sending mail via <forms> for the purpose of extracting my address. As for the scripts, harvester programmers look for the scripts used to foil their bots (such as the ones posted in this thread) for the purpose of reverse-engineering said scripts so the bot can penetrate and extract addresses the site owner thinks are safe.

After all, if you were a spammer, you wouldn't want a cheap harvester, would you?
You might be paying several hundred dollars for some 'top notch software.'

However, and this is where the good news starts, the serious trouble didn't start until I wanted some TRAFFIC.

For at least the first 2 years I got anywhere from 20 to 100-200 visitors/day or thereabouts but it was all reciprocal links and small directories and webring stuff, far away from the top of the Internet. Google hardly existed at the time, but Altavista and Yahoo and the rest of the big guys also did not know my site existed.

See, harvesters use engines to find web sites to crawl, and they find sites via the use of operator-provided keywords. Spammers assume that via the use of keywords, their spam will be targeted.
I know this because a lot of the spam I receive contains the very keywords which not only exist on my site, but show in my stats as how my visitors find me. Thus it comes as no surprise to receive spam for 'replica watches' although the keyword 'replica' actually refers to a few links I have to sites which contain Replica Kit Cars!

As a sidenote, one might think the spammers got close with the watches, but it's ALWAYS off, they are never on target with their garbage, keywords or not.

So, it turns to reason you will have no real trouble with harvesters (or the resulting spam) until such time when your site is listed in the Yahoo! directory OR for some other reason your site starts ranking on the FIRST page of results for some popular, single-word key word(s) on a high-traffic or popular engine such as Google.

Because before that, I really can't say I had spam problems... Well I thought I did, but I just hadn't experienced real spam yet.

So, do as you wish but you can always turn off emails later, I do not feel you will have problems until you develop some recognition, no offense intended.


 1:16 pm on Dec 23, 2005 (gmt 0)

Thanks a very detailed reply.

I tend to agree with most of what you say regarding the bots being able to penetrate even the best of sites.

My site is non commercial, relativly small and of limited appeal and therfore I only get a few entries a month in the Guest Book. I noticed for the first time yesterday, three entries that were posted by advertising sites/porn sites (swiftly removed).

I have manually intervened and protected the email addresses of entries that are posted. This is ok for now unless, of course, I get more popular.

The site, although small, and limited in appeal is quite well ranked in Google and MSN and a few others and that is probably why the bots are beginning to crawl.

Thanks again for your advice.


 11:57 am on Dec 24, 2005 (gmt 0)

My Guest Book is off the shelp PHP script.

which shelf did you get it from?


 3:12 pm on Dec 24, 2005 (gmt 0)

going back to the OP...

if your purpose is to prevent those listing their email address on your guestbook from getting spammed, your efforts will most likely be a drop in the bucket.
The people signing your guestbook are probably already getting Spammed.

I have one site where I have One email address for contact and when I REPLY to that email, I use another username that is NOT posted on the web anywhere.

The REPLY address gets more Spam than the address posted on the website (obscured by Enkoder).

So my address is getting harvested by trojans and other things in User's Windows computers that are just going through their address books in Outlook.


 3:34 pm on Dec 24, 2005 (gmt 0)

Good point.

I was just trying to minimize the damage for people visiting the site. But I agree it's probably futile as their inboxes like mine are probably full with junk already.

Looking over all the replies, you maybe can do things to minimize the damage but if they are already getting spammed then its a bit of a wasted exercise.

Thanks for your reply


 5:39 pm on Dec 24, 2005 (gmt 0)

Reason I know this is I got bots probing inside my cgi bin and penetrating the perl script

Surely you set the files in your cgi-bin to be execute only?
That would stop the bots.

There should be no way into your cgi-bin except to execute what is known to be in there...

This 39 message thread spans 2 pages: 39 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved