Restricting email harvets bots?

Forum Moderators: open

Message Too Old, No Replies

Restricting email harvets bots?

does this really work?

kiwanji

1:43 am on Mar 9, 2004 (gmt 0)

I found a great idea for limiting the effectiveness of spam bots when searching the archives (http://www.webmasterworld.com/forum48/467.htm?highlight=spam+bots). jimbeetle mentioned doing a google search for email address encoders and I found one (so as to not post the URL, it was first on the returns list if you care to look) that turns text into its "equivalent decimal entity." So, on the site I am updating I want to have all email address listed as their equivalents rather than the plain text.

My question is, does this truly restrict the spam bots? This site has many emails listed and I need to get them off of the web as they are now, we are being deluged with spam and I am thinking it is from being listed freely like this.

Please help with this, or offer other suggestions, I am all ears.

Thanks much-

bill

2:12 am on Mar 9, 2004 (gmt 0)

There are a bunch of these out there...Beware of the ones that merely transpose your address into ASCII text. The spambots have cracked that already. Try something JavaScript based like the Hiveware Enkoder. Those seem more reliable now.

Another thing to keep in mind...if you're already getting spam on the posted addresses then encoding them now will probably not do much for you in terms of the volume of spam you get. You've likely already been harvested and your addresses shared all over the place. You now need to either change the addresses or get some good filtering in place.

rogerdp

2:15 am on Mar 9, 2004 (gmt 0)

In addition to any hiding techniques, you should run a program similar to SpamAssassin.

As for hiding techniques: As a user, I find it very annoying to have to *extract* the true email address instead of being able to click it and write the thoughts that are fresh in my mind. Decoding the address means a brain context switch. Alternatively, uou might consider something like a "spam" subdomain (i.e.: spam.mydomain.com) and more strongly filter (with SpamAssassin or similar) mail coming through that domain before redirecting it. Then, any email going on a website can use that subdomain.

rogerdp

2:19 am on Mar 9, 2004 (gmt 0)

Using JavaScript for this is inappropriate. Not only is it trivial for harvesters to pass the page through a JavaScript interpreter, it's another annoyance for those of us who browse with JavaScript disabled for speed or other reasons. It's even more annoying than suffixing NOSPAM or the like (which I'm sure harvesters have started to recognize) as it cannot be changed.

bill

2:25 am on Mar 9, 2004 (gmt 0)

Welcome to WebmasterWorld rogerdp

JavaScript is the way to hide your address in plain sight nowadays. Although you say it may be trivial to process JavaScript, I have yet to have one harvested by a bot. Maybe it depends on the technique you use.

The safest thing to do would of course be to remove all your e-mail addresses and use a form instead. However, that's not always an option.

kiwanji

2:37 am on Mar 9, 2004 (gmt 0)

It seems that this has started a healthy debate.

I wanted to mention something that I noticed when looking at Hiveware Enkoder as bill mentioned. It seems that the code that the Enkoder produces is quite large for the purpose. I am always worried about bloating my page�s code needlessly. What are the thoughts regarding this? Is there a smaller way to protect email addresses and keep code to a minimum?

bill

2:42 am on Mar 9, 2004 (gmt 0)

Sure there are shorter methods of encoding, but unless you're adding thousands of e-mail addresses to your page that little block of text is hardly going to be considered bloat.

Birdman

2:48 am on Mar 9, 2004 (gmt 0)

Also think about implementing a bad bot trap and a user-agent ban list for .htaccess. I agree with bill, forms are the way to go.

rogerdp

3:06 am on Mar 9, 2004 (gmt 0)

Thanks for the welcome, Bill.

As for harvesters not doing parsing JavaScript, what makes you think that? There are command line implementations of JavaScript and it would be trivial to pass a page through a program. A good harvester is virtually undistinguishable from a real client.

If I was writing a harvester, I would be looking for the ways people encode their addresses, as those addresses are *more*valuable*, since spam sent to them isn't as likely to get lost in the shuffle and since other people selling lists won't have those addresses. I'm sure anyone writing a harvester has browsed this site and seen all the reccommended ways of hiding addresses.

The ultimate solution to spam is not hiding.

Purple Martin

3:37 am on Mar 9, 2004 (gmt 0)

A user-agent ban list for .htaccess is a nice idea however it only takes one new/unknown harvester to arrive at your site and you'll be spammed for all eternity.

tedster

3:58 am on Mar 9, 2004 (gmt 0)

I agree that JavaScript "cloaking" for email addresses works, at least for now, and it's easy to implement and maintain.

I put in a program of js cloaking for email addresses on client websites almost two years ago and the spam silence is now amazing. Sure, if lots of people do this, then the harvesters will adapt to it. But so far there are such easy pickings out there, in bare nekkid html, that the harvesters haven't bothered.

Oh yes -- our clients usually offer an 800 number in plain sight and in some cases a text image of the email address as well. If you want to reach them, you definitely can.

This is war, and some of our comforts are sacrificed. I just hope all email doesn't end up compromised eventually. That's what it looks like will happen from the present vantage point, but who knows, we may yet pull a rabbit out of this black hat.

bill

4:01 am on Mar 9, 2004 (gmt 0)

rogerdp I'm not saying that it can't be done...just that most of the harvesters don't seem to bother with JavaScript yet. I've had JavaScript encoded e-mail addresses up on several domains for the last 8 months, and I've yet to receive spam on any of them. Prior to that I used ASCII obfuscation on the e-mail addresses and found that the spam-bots got every single one.

I'm not saying that hiding is the answer to spam...this is just an answer to kiwanji's original question. Birdman's suggestion to use a bot trap and .htaccess ban list would be my next step if I start getting spam on my current sites. Maybe I've just been lucky so far...

rogerdp

4:14 am on Mar 9, 2004 (gmt 0)

An .htaccess bot trap seems even less likely to work than JavaScript. Are you saying email harvesters proudly declare what they are? How many show up in your server logs?

bill

4:23 am on Mar 9, 2004 (gmt 0)

If you read up on how bot traps work you'll find that it doesn't matter what the bots call themselves ;)

rogerdp

4:30 am on Mar 9, 2004 (gmt 0)

Where can I do that?

bill

4:35 am on Mar 9, 2004 (gmt 0)

Here's one: Ban malicious visitors with this Perl Script [webmasterworld.com]. Take a look around here...search for "bad bot scripts". There are a ton of threads.

<added>Here's another: bad-bot script: follow-up? [webmasterworld.com]</added>

Birdman

1:40 pm on Mar 9, 2004 (gmt 0)

I just posted a PHP bad bot script [webmasterworld.com] the other day. Feels good every time you nab one.

grahamstewart

1:46 pm on Mar 9, 2004 (gmt 0)

I posted a reasonable Javascript solution here: [webmasterworld.com...] that slips your address into the required places in the HTML.

For people with Javascript disabled I suggest you supply a 'contact us' form. Obviously go for one that only has your email address on the server side!

g1smd

10:33 pm on Mar 12, 2004 (gmt 0)

ASCII encoding no longer works. Avoid that completely now.

The mentioned encoder script is big, so if you use it, put it in an external javascript file and call it from the HTML file.

I do not like any of the someone@domain.nospam.com.nospam methods as the link is not clickable and immediately usable without some editing.

The best method is still one based on using code fragments assembled with document.write statements (again best from an external file).

Use that, and then also have a contact form for people to email directly from the website. I find about 10% of people use the form, rather than their email program. I always wonder if that 10% would not have bothered writing at all if the form wasn't there.

However whatever method you use, you still have to do a Google search every few weeks to make sure that no-one else has published your email address in a directory entry, or press release, and so on.

Webwork

11:31 pm on Mar 12, 2004 (gmt 0)

How complex is the script? I just searched Google and the examples of "email cloaking scripts" only ran about 7 - 9 lines.

Am I missing something? Could someone provide a link to a source for "effective javascripts" for the purpose of cloaking?

Thanks

uncle_bob

11:35 pm on Mar 12, 2004 (gmt 0)

Don't people use (gif) images of email addresses anymore? Or should I add this to the "so '90s" thread?

TheDoctor

5:10 pm on Mar 14, 2004 (gmt 0)

There've been a number of discussions about the use of javascript to generate email addresses, and a version of my own script can be seen at [webmasterworld.com...] It was derived from a script I found at [webmasterworld.com...] (where there is at at least one other method suggested, also using javascript). My contribution to the approach is to use a variable number of parameters to confuse any harvester that considers itself intelligent.

In general, the method seems to work. I only rarely get spam at addresses hidden by this method, while, for addresses that for one reason or another can't be hidden, I get inundated.

I've also found that the method works for email addresses that the harvesters have already got hold of. I think a harvested list must have a limited shelf life, and so hidden addresses tend to get dropped once new list is gathered.

g1smd

12:44 am on Mar 15, 2004 (gmt 0)

The script basically uses

document.write (a+b+c+d+e+f+g)

where those letters represent fragments of code and the email address from a broken up string that started off like:

<a href="mailto:someone@somewhere title="extra information">link text</a>

and which is reassembled by the javascript VM that runs in the browser client.

Timmie

10:41 pm on Mar 22, 2004 (gmt 0)

Maybe a silly question: suppose email addresses are generated in an asp page (out of a database), does this make them less vulnerable to spambots, or doesn't it make a difference at all?

g1smd

11:34 pm on Mar 22, 2004 (gmt 0)

The output of a PHP, ASP, CGI, or any other script is a stream of HTML code that looks identical to the sort of HTML that would come from a static page served to the browser.

So, you still need to protect the address from being spidered.