Welcome to WebmasterWorld Guest from 54.147.10.12

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

Protecting Email Addresses From Harvesters

     
4:27 pm on Dec 14, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Hello Guys,

Probably like a lot of people I donít know much about robots and spiders etc. So hopefully somebody can help me a little here.
I have a website, non-commercial, and I left a blank robot file on the server as suggested on one of the threads on this site. I wanted to encourage visitors - no problem there. I am now wondering about the best way to protect email addresses that are posted on my Guest Book as I assume that the Guest Book will be spidered with the site and those that leave there mail address will get spammed. Donít want that!
Any ideas gratefully accepted.
Regards

Hugh

8:18 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


Why would you want people to post their e-mail addresses to your guest book?

(Of course, you could always put the guest book in its own folder and prohibit this folder in the robots.txt file, but the spam-bots won't respect that, so this wouldn't accomplish anything useful, in the long term.)

Wouldn't it be better to admonish them not to include that information? Or am I misunderstanding what you're trying to do...?

Eliz.

10:17 pm on Dec 14, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Thanks for your reply.

Those that leave an entry in my Guest Book can leave their name, where they come from, and their email address if they so wish (optional). I respond to every entry in my Guest Book if they leave their address. I just like to thank people who take the time to read and comment on the site.

As I have become more web aware, I realise that those bots crawl my site. I am also aware that those email addresses are then probably spammed. I just donít like the fact that people leave their contact address on my site and are maybe not aware that robots crawl, pick up the addresses and their inboxes get full.

If thatís the only way to do it then will disable that function.

Thanks again.
Hugh

8:46 am on Dec 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 1, 2002
posts:1580
votes: 0


shug,

There are numerous solutions to this problem.

1. Change your site so that the email addresses are displayed as graphics that can't be picked up by spam-bots. You will need some software to do that.

2. Change the site so that the email addresses are modified in such a way as to make it obvious to humans what the email address should be, but, hard for bots to see......

i.e. John&johnsmith.com is displayed as:
"no-spam-john@johnsmith.com"......you get the idea?

3. Install a form to email your true consumers. The form will hide the true email address from bots, but, will allow humans to communicate with valid human contacts.

There are many other solutions, these are just 3 simple ones :)

9:04 am on Dec 15, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3171
votes: 8


i am assuming you are a relatively new webmaster using an off the shelf guestbook.

as percentages said there are many solutions to your problem - but all require a degree of programming knowledge.

your only real easy solution is to turn off the 'add email' function.

or what you could do if you have a 'form mail' script is get the visitors to send their guest book entry to you using a form and then you enter the details into your guestbook script manually for them - also don't have any links to the 'add guestbook entry' - a bit long winded but could work.

10:42 am on Dec 15, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Topr/Percentages,

Thank you both for your replies. They are much appreciated.

I am a new webmaster I have learnt xhtml and css and have only the slightest knowledge of php. Not enough to even attempt to do my own Guest Book. So It's off the shelf. I like the idea of a form so that I can manually input myself but will have to learn how to do that. But needs must.

I will think about either changing or disabling the email input for the time being.

Thanks again
Hugh

1:58 am on Dec 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


I write all e-mail addresses to the web page using JavaScript, including those in my Guest Book.
I also give the user the option not to display the e-mail address at all (however I can still see it in the database).
For some of my own addresses I have an image in the noscript block, so that those not running JavaScript can still see my e-mail addresses (but bots can't).
9:39 am on Dec 19, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Hi,

Thanks for that info. The problem I have with that is my GuestBook is off the shelf - not written by me. I am not that familiar with Javascript and would not know where to begin.

thanks
Hugh

5:02 pm on Dec 21, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Apr 9, 2003
posts:336
votes: 0


On my e-commerce site I have a contact form, but i also display the email address as a .jpeg image. Other than the image, there is no email address displayed.

On my other site, which is a hobby site, people request their contact details to be put on the site. I always change their email address from:

someone@domain.com to someone(AT)domain.com

Tim

5:26 pm on Dec 21, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Thanks Tim. Changing the email address manually is what I think I will do until I get more clued up.

Much appreciated
Hugh

6:17 pm on Dec 21, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3443
votes: 1


I made a test, with one of my domains, I registred it With a private ("unlisted") registration and I maybe get one spam email a month.
6:58 pm on Dec 21, 2005 (gmt 0)

Junior Member from US 

10+ Year Member

joined:July 17, 2003
posts:200
votes: 1


See our main website, the url in profile. Look at the source. The email link seen at the bottom of the page in red, is in the script at the top of the page source.

I also use this to hide internal links on my large sites, so they don't delute the keywords on the page.

This particular site, I don't use it for the navigation, because it would be negative to do so. There are the right times and the wrong times to use tools. You have learn that.

7:49 pm on Dec 21, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:July 8, 2005
posts:460
votes: 0


I think people that post in Guest books and leave their email address do it all over the web and they are aware of email harvesting or don't care.
7:49 pm on Dec 21, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Feb 23, 2003
posts:207
votes: 0


Don't use images. That immediately excludes everyone with screen readers.

The hiveware encoder doesn't have a fallback for folks with JavaScript disabled.

I have a custom routine I use that encodes the email address down. So, <a href="mailto:email@example.com">email@example.com</a> becomes:

<script language="Javascript" type="text/javascript">
<!--
document.write('<a href="mai');
document.write('lto');
document.write(':&#101;&#109;&#97;&#105;&#108;');
document.write('@');
document.write('&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;">');
document.write('&#101;&#109;&#97;&#105;&#108;');
document.write('@');
document.write('&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;<\/a>');
// -->
</script><noscript>&#101;&#109;&#97;&#105;&#108; at
&#101;&#120;&#97;&#109;&#112;&#108;&#101; dot &#99;&#111;&#109;</noscript>

To most end-users -- including those with screen readers -- it is a standard clickable email address. To those with JavaScript disabled, it appears as "email at example dot com" which is easily user-interpretable.

The function and the code are freely available under a Creative Commons license. I cannot post it here, however, as the license links back to my personal homepage.

I have heavily trafficked sites that use this encoding on email addresses that don't get any spam.

One side note: it's always a good idea to have the email address by copyable and shown as the actual address rather than something like "click here to email us". Lots of people use webmail and can't click on mailto: links as there is no default mail client configured on their PC. (and yes there are systray tools that allow you to enable this, but hardly anyone that uses webmail has them installed.)

9:30 pm on Dec 21, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Some interesting replies there.

The javascript option: I actually use that in my main page already but I am unsure if I can do the same with the Guest Book. My Guest Book is off the shelp PHP script.

Unless something like that will work in the PHP script I will have to alter the email addresses manually until I come up with something better.

Thanks for all your replies.
Hugh

10:03 pm on Dec 21, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Sept 13, 2003
posts:214
votes: 0


Is email address "munging" still effective? If I was writing an email harvesting program I'd look out for strings that resembled:

username at domain dot com
username(at)domain(dot)com

and the other popular permutations.
I'd imagine you have to be a little inventive and come up with a new one for your site for it to be completely effective. Still doesn't get rid of the annoyance of not being able to click directly on the proper address.

12:29 am on Dec 22, 2005 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9685
votes: 0


I do something similar to Critter without the character translation, i.e., assemble the address from pieces using javascript, and offer a NOSCRIPT alternative that isn't directly readable as an email address.

None of these are perfect, but they'll defeat most spambots. It's been noted before, but I'll say it again - if a browser can decode your stuff, so can properly designed spiders.

Here are a couple of totally different approaches:

1) Replace the guest book with a comment form - guest books are soooo 1997, and are magnets for unsophisticated spammers.

2) If you must have a guestbook, modify its code to print two pages - a hidden one not linked from the site with no emails that you can read, and a public one that prints the same comments minus emails. The coding for this is likely to be simpler than other alternatives.

2:01 am on Dec 22, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2001
posts:1262
votes: 12


There are email harvesting robots that can read javascript. It's not the norm but it will be eventually.

It would be simple, even for a PHP beginner, to remove email address display from the guestbook. You would still be able to see the email addresses yourself in the database (or file).

Email addresses should never be displayed publicly, use contact forms instead.

3:58 am on Dec 22, 2005 (gmt 0)

New User

10+ Year Member

joined:Oct 27, 2005
posts:25
votes: 0


>>> There are email harvesting robots that can read javascript. <<<

Yup! We've been down all these paths and the spam bots always found their way around our methods. Only the image option is totally effective as there's nothing there for a bot to see, but this is not very convenient for users.

We're currently using Unicode and have been pretty much spam-free for six months, even though the spam bots continue to regularly hit our site. This displays as a normal email link for users, can be read by screen readers and is compatible with all current browsers -- including those with JavaScript disabled.

If you're interested, just search for ďascii to unicode converterĒ on Google and select the first listing. Use this tool to convert your HREF value, including mailto: (everything inside the quotes, but leave the ďquotesĒ in place) and replace the ASCII text in your HTML with the Unicode output. If you want your actual email address displayed as the link, do the same for the link text.

If you're using a WYSIWYG editor, you may need to do this in a text editor, as it will probably convert it back to ASCII when you save the file. FrontPage users (or CGI form users, where the recipient email is included as a hidden field) can also convert email addresses used in their forms -- again, with a text editor.

There are bound to be those that follow this post stating that this can also be harvested, but if you think about it logically, itís a huge and very slow task to scan pages in Unicode, which is probably why malicious developers havenít taken this approach yet. Iím not saying itís impossible; just not very practical -- or necessary with so many easier methods.

I hope this helps ;)

5:20 am on Dec 22, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 7, 2003
posts:138
votes: 0



None of these are perfect, but they'll defeat most spambots. It's been noted before, but I'll say it again - if a browser can decode your stuff, so can properly designed spiders.

Well, this is true. BUT... once you start doing things in a sufficiently complicated manner, you have effectively turned it into the halting problem.

Also keep in mind that if you are doing custom 'protection' system, you definitely have diminishing returns for spammers.

5:23 am on Dec 22, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member drdoc is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 15, 2002
posts:6807
votes: 0


Or, you can always do something like this:

function email($username, $domain, $tld) { 
$uservar = substr(md5(microtime()), 0, 13);
$atvar = substr(md5(microtime()), 0, 13);
$domainvar = substr(md5(microtime()), 0, 13);
$dotcomvar = substr(md5(microtime()), 0, 13);
echo "<script type=\"text/javascript\">
gx$domainvar = \"$domain\";
gx$uservar = \"$username\";
gx$dotcomvar = \"$tld\";
gx$atvar = \"\\x40\";
document.write(\"\\x3c\" + \"a href='ma\" + \"ilto:\" + gx$uservar + gx$atvar + gx$domainvar + \".\" + gx$dotcomvar + \"'\\x3e\" +
gx$uservar + gx$atvar + gx$domainvar + \".\" + gx$dotcomvar + \"\\x3c/a\\x3e\");
</script>
<noscript><div>$username<del class=\"del\">DELETETHIS</del>&#64;$domain<del class=\"del\">DELETETHIS</del>.$tld</div></noscript>";
}

Then, of course, specify

display: none
in your stylesheet for the
.del
class

[edited by: engine at 3:10 pm (utc) on Jan. 16, 2006]
[edit reason] fixed scrolling [/edit]

10:08 am on Dec 22, 2005 (gmt 0)

Moderator from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11318
votes: 169


The hiveware encoder doesn't have a fallback for folks with JavaScript disabled.

I've used the Hiveware Enkoder with a gif version of the address in <noscript>. Hadn't thought about screen readers. Possibly could also use an alt tag on the gif with "email name at domain dot com." That would certainly cover all bases. I will say that the Hiveware Enkoder has certainly cut down on spam... in fact more or less eliminated it... wherever I've used it.

5:35 pm on Dec 22, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Thanks for all your help
Hugh
6:35 pm on Dec 22, 2005 (gmt 0)

New User

10+ Year Member

joined:Feb 10, 2005
posts:5
votes: 0


This won't work for your guestbook email addresses, but I have a law firm as a client and the attorneys signed up for an online service called CipherSend a few weeks ago which allowed them to replace their website email addresses with a button that their site visitors can click to send files and messages securely to their email address without a password. I don't recall it being real expensive and it definitely got their email addresses off their site!
7:55 am on Dec 23, 2005 (gmt 0)

Junior Member

joined:Mar 13, 2005
posts:174
votes: 0


This has been my experience...
Please read through, the bad news comes first but it's not all bad.

First of all, nothing will stop the spambots from harvesting email addresses. Today's harvesters use sophisticated software, some with OCR-readers (Optical Character Recognition) that read image-based addresses, and url and script de-obfuscation that render even expert coding tricks useless to all but the simpler bots. Reason I know this is I got bots probing inside my cgi bin and penetrating the perl script used for sending mail via <forms> for the purpose of extracting my address. As for the scripts, harvester programmers look for the scripts used to foil their bots (such as the ones posted in this thread) for the purpose of reverse-engineering said scripts so the bot can penetrate and extract addresses the site owner thinks are safe.

After all, if you were a spammer, you wouldn't want a cheap harvester, would you?
You might be paying several hundred dollars for some 'top notch software.'

However, and this is where the good news starts, the serious trouble didn't start until I wanted some TRAFFIC.

For at least the first 2 years I got anywhere from 20 to 100-200 visitors/day or thereabouts but it was all reciprocal links and small directories and webring stuff, far away from the top of the Internet. Google hardly existed at the time, but Altavista and Yahoo and the rest of the big guys also did not know my site existed.

See, harvesters use engines to find web sites to crawl, and they find sites via the use of operator-provided keywords. Spammers assume that via the use of keywords, their spam will be targeted.
I know this because a lot of the spam I receive contains the very keywords which not only exist on my site, but show in my stats as how my visitors find me. Thus it comes as no surprise to receive spam for 'replica watches' although the keyword 'replica' actually refers to a few links I have to sites which contain Replica Kit Cars!

As a sidenote, one might think the spammers got close with the watches, but it's ALWAYS off, they are never on target with their garbage, keywords or not.

So, it turns to reason you will have no real trouble with harvesters (or the resulting spam) until such time when your site is listed in the Yahoo! directory OR for some other reason your site starts ranking on the FIRST page of results for some popular, single-word key word(s) on a high-traffic or popular engine such as Google.

Because before that, I really can't say I had spam problems... Well I thought I did, but I just hadn't experienced real spam yet.

So, do as you wish but you can always turn off emails later, I do not feel you will have problems until you develop some recognition, no offense intended.

1:16 pm on Dec 23, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Thanks a very detailed reply.

I tend to agree with most of what you say regarding the bots being able to penetrate even the best of sites.

My site is non commercial, relativly small and of limited appeal and therfore I only get a few entries a month in the Guest Book. I noticed for the first time yesterday, three entries that were posted by advertising sites/porn sites (swiftly removed).

I have manually intervened and protected the email addresses of entries that are posted. This is ok for now unless, of course, I get more popular.

The site, although small, and limited in appeal is quite well ranked in Google and MSN and a few others and that is probably why the bots are beginning to crawl.

Thanks again for your advice.
Hugh

11:57 am on Dec 24, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 21, 2004
posts:39
votes: 0


My Guest Book is off the shelp PHP script.

which shelf did you get it from?

3:12 pm on Dec 24, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Mar 21, 2004
posts:267
votes: 0


going back to the OP...

if your purpose is to prevent those listing their email address on your guestbook from getting spammed, your efforts will most likely be a drop in the bucket.
The people signing your guestbook are probably already getting Spammed.

I have one site where I have One email address for contact and when I REPLY to that email, I use another username that is NOT posted on the web anywhere.

The REPLY address gets more Spam than the address posted on the website (obscured by Enkoder).

So my address is getting harvested by trojans and other things in User's Windows computers that are just going through their address books in Outlook.

3:34 pm on Dec 24, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:May 18, 2005
posts:57
votes: 0


Good point.

I was just trying to minimize the damage for people visiting the site. But I agree it's probably futile as their inboxes like mine are probably full with junk already.

Looking over all the replies, you maybe can do things to minimize the damage but if they are already getting spammed then its a bit of a wasted exercise.

Thanks for your reply
Hugh

5:39 pm on Dec 24, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 23, 2003
posts:801
votes: 0


Reason I know this is I got bots probing inside my cgi bin and penetrating the perl script

Surely you set the files in your cgi-bin to be execute only?
That would stop the bots.

There should be no way into your cgi-bin except to execute what is known to be in there...
DerekH

This 39 message thread spans 2 pages: 39