homepage Welcome to WebmasterWorld Guest from 54.234.2.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

This 66 message thread spans 3 pages: 66 ( [1] 2 3 > >     
Best way to stop mailto: robot spidered spam?
chiyo




msg:574555
 8:14 am on Oct 4, 2002 (gmt 0)

We have published our email addresses on our web sites for almost 10 years using simple mailto: or forms. In the past year, the amount of spam has become intolerable. We wish to reduce that to manageable levels, by changing all pages with mailto: to something not spiderable for spambots. Would would be great is a simple "search and replace!" Yep, im lazy too..

What is the best way?

1. Just unlink all email addresses, so those who are REALLY interested in emailing you have to copy or paste into their email program rather than just clicking. This seems to be a sensible wolution at first thought, very easy to implement.

2. USe js, (not sure exactly how to do that) disadvantages.. some non-js enabled set ups wont work with it, and make such users frustrated and think your page is broke!

3. Direct all hyperlinked email addresses to our mail form, or even a new form (system) where the email address appears in the "send to" field when they get there?

Which of these do you suggest, or are there any other quick and dirty solutions?

 

Dreamquick




msg:574556
 8:57 am on Oct 4, 2002 (gmt 0)

The easiest way to stop email addresses on pages from being harvested is not not have them on the pages in the first place - using something like a contact form which either stores the messages or emails them to you will do the job nicely.

If you are feeling less technical you could keep the addresses on the page but swap key parts of them for the entity codes e.g. one example of this would be;

mailto:root@example.com

Using 109 as the ascii code for "m", 64 = "@" and 46 = ".", essentially makes pattern matching an email address in page source next to impossible unless you decode the entire page.

This method is not totally foolproof but thankfully 99% of the programs designed to scrape addresses off sites are written for speed rather than complexity and so they wont bother to do the decoding and so wont see something they understand as an email address.

Thankfully most browsers (inc. lynx) understand that the page needs decoding, and so what the user sees makes perfect sense and acts exactly like the original code did!

- Tony

p.s. I have a feeling that there are other variations of the character encoding which may work better than just plain ascii

chiyo




msg:574557
 9:25 am on Oct 4, 2002 (gmt 0)

That sounds like a good solution thanks a lot! Due to the nature of the site, where authors want to be emailed with comments about their latest article, email addresses are essential, but we are just trying to reduce, not completely get rid of this problem.

edit_g




msg:574558
 10:23 am on Oct 4, 2002 (gmt 0)

Set up an email address, seperate from your other email addresses, and use that one for the website. You can quickly scan it and tell the spam from the serious stuff.

You can also use something like this: [chepd.mq.edu.au...]

ratman




msg:574559
 10:57 am on Oct 4, 2002 (gmt 0)

I had this problem a while back and eventually discovered a nice little free web tool called Email Obfuscator. It is based on the example given by Dreamquick above but encodes the whole email address automatically.

[alicorna.com ]

Since we added this to our pages we haven't had any new spam mail.

ratman

Dreamquick




msg:574560
 11:25 am on Oct 4, 2002 (gmt 0)

The downside of obfuscating the entire thing is that if someone is browsing in pure text only or they fail to decode even a real person will get puzzled and have no clue whats going on.

If you only make a small number of important character swaps a real person seeing the raw data will be less puzzled, the only real danger is that if you choose *really* common swap-outs an email scraper could notice that a lot of people swap characters X & Y and with a minor tweak you are vulnerable again!

*If* I designed these sorts of things I'd consider running one replace, possibly two but beyond that the work involved would start to slow that "X,000 a minute" email extractor down a little which would loose me a selling point, which for the sake of getting a handful of unreceptive people would not be worth it.

Why unreceptive?

Simple swaps can be done by anyone (swap @ for <blah>) and so if I were to grab their addresses they might be potentially receptive as they could just have read a tip on how to stop spam which they have used blindly.

But when people start doing complex mungs or swaps then the chances of actually getting a result out of them is going to be really low, plus the chances of them complaining (and complaining to the right people in the right way is increased dramatically) increases a fair bit!

- Tony

ratman




msg:574561
 12:03 pm on Oct 4, 2002 (gmt 0)

I tried just changing five important characters in the email address (including the @ symbol), but although the spam slowed down a bit it still kept coming.

I tried looking for other ways to hide the email address but all of the others I found involved the use of Javascript.

I understand your point Dreamquick, and have been looking for an alternative but obfuscating has worked for me and I haven't had any complaints (yet!), so it is an option if nothing else works.

You can also set up your email software or server software to filter out and return any messages containing certain words, but this can obviously backfire. See the following (entertaining) post for some suggestions.

[webmasterworld.com ]

ratman

Dreamquick




msg:574562
 1:25 pm on Oct 4, 2002 (gmt 0)

ratman,

I've had contact details on my site since day #1 partially obfuscated and I get maybe a piece of spam every month on the those accounts (touch wood).

The main difference is that I also run obfuscation in tandem with a server-side ruleset which returns a gibberish address to anything which is obviously an email harvesting bot so even if they do manage to decode correctly then they get something which is worthless to them. :)

That said, stopping the wrong people getting the address in the first place is only half the game - you need to make their lives that little harder if they manage to deliver spam. This should discourage people from trying to use that list the next time given how badly the last attempt went.

(Okay fortunately the only address they seem to routinely find ATM is the one for technical and security list subscriptions and so sending spam to those sorts of addresses *guarantees* massive amounts of complaints etc.)

- Tony

toadhall




msg:574563
 3:04 pm on Oct 4, 2002 (gmt 0)

Try a combination of all these suggestions and continue to filter. You could also add something like

<a href="mailto:postmaster@[127.0.0.1]"></a>

...for a little hand-wringing nya ha ha. I'm sure some mail harvesters are clued in, but the thought of even a few getting stuck is satisfying.

If harvested and used this will effectively spam the spammer. :)

keyplyr




msg:574564
 6:50 pm on Oct 4, 2002 (gmt 0)

I have had success with using a .gif image of the email address (for visual
branding) linked to JavaScript pop-up window that contains the simple mailto
form. The mailto form does contain the actual address but since it is only
linked by JS, this html page is not bot accessible.

GaryK




msg:574565
 7:11 pm on Oct 4, 2002 (gmt 0)

You might want to check the Contact Me on the page in my profile. I too use images for the actual address and a JS routine to handle the mailto part without exposing the e-mail address. On another site I run with a few thousand members they really appreciate being able to leave their email address public without fear of being spammed.

gph




msg:574566
 1:36 am on Oct 5, 2002 (gmt 0)

I just went through this and went with a couple of ideas from different sources. I used an image for javascript disable (just an image, no mailto link) and broke the mailto apart for enabled putting a portion on the page and the rest in an external js.

Pseudo code:

page:
<script type="text/javascript">foo('ma','inquires','domain')</script>

external file:
function foo (a,b,c) {
document.write('<a href="'+ a + 'il.........);
}

pkchukiss




msg:574567
 5:35 am on Oct 6, 2002 (gmt 0)

I have got a suggestion. We could foil those evil people's plans by creating a fake file, FULL of fake email addresses (they must look the job), at least 1 megabyte, and post it on the web server. Also, link to this file from the other web pages with a link that says, "Email addresses for spam bots (1 MB)" or something to that effect. That way, humans won't click on the link, but those pesky bots would. After they have harvested all the email addresses, and merged it into the list, they would have at least 50 % useless email addresses. That would foil those people's plans. A list with many useless email addresses would be extremely useless to those pathetic spammers. Better still, crash that stupid bot by making the file size very big, or add taunts like "haha@IHaveSpoiltYourList.SoThere" to rub salt into those spammer's injury. So much so for revenge!

chompy




msg:574568
 7:18 pm on Nov 8, 2002 (gmt 0)

[webmasterworld.com...]

found this link in the generic java script message. it is a javascript to assemble an email address

Macguru




msg:574569
 7:31 pm on Nov 8, 2002 (gmt 0)

Here is a good one : Its free and you can download a self standing Java version to encode emails in JS right from your box.

[hivelogic.com...]

amoore




msg:574570
 7:37 pm on Nov 8, 2002 (gmt 0)

You could make an image that looks like your email address and use it on your pages. I've seen it done and I think it's pretty slick.

nonprof webguy




msg:574571
 7:44 pm on Nov 8, 2002 (gmt 0)

When we put everybody's e-mail address on our site a year ago, there was a lot of fear about spam. For some reason, though, the spam hasn't come. Maybe one every week or two. I suspect that because our domain is .org, spammers choose not to harvest our emails. I know from my logs they are definitely looking at us, and I ban them when I can. Does anybody know if .gov domains are similarly avoided by spambots?

Macguru




msg:574572
 7:48 pm on Nov 8, 2002 (gmt 0)

I got many org sites spammed.

Mardi_Gras




msg:574573
 7:52 pm on Nov 8, 2002 (gmt 0)

Macguru - that's a pretty slick little tool. Thanks.

Macguru




msg:574574
 8:01 pm on Nov 8, 2002 (gmt 0)

It is indeed. But non JS enabled browsers will be bamboozled too. The best method is a contact form, but it's not for all budgets.

dingman




msg:574575
 8:34 pm on Nov 8, 2002 (gmt 0)

The best method is a contact form, but it's not for all budgets

I can write a simple contact form in 10 minutes. I don't much use them because I find that being presented with a contact form instead of an e-mail address feels less personal. That and I haven't found a browser yet that will let you plug a decent editor into the textarea.

(Hey, programming project - write a plug-in that'll let me embed Emacs in Galeon textareas. As if I didn't have enough projects in mind already.)

bill




msg:574576
 8:32 am on Nov 13, 2002 (gmt 0)

I was reading on the wPoison site that most e-mail harvesters will ignore any page that comes under your /cgi-bin/ folder due to the success of these fake address generating programs. Maybe a good way to hide your addresses would be to put any pages containing addresses there?

Another thing they mentioned was that in order to prevent their program from trapping legitimate SE spiders and the like they add the robots exclusion meta tag to pages generated by the program. From the sounds of it, a lot of the more sophisticated e-mail harvesting softwares are now following the robots exclusion meta tags to avoid running into one of these honeypots. Maybe it's also a good idea to add these tags to your pages with e-mail addresses on them?

Hoople




msg:574577
 8:56 am on Nov 13, 2002 (gmt 0)

Anti-Spam Script Maker 3.1 (Javascript solution)

ANTI- SPAM PING JPEG MAKER 1.0

Both freeware at [assmaker.mybravenet.com...]

Romeo




msg:574578
 8:13 pm on Nov 13, 2002 (gmt 0)

Personally, I don't like feedback and contact forms.
I hate it to fill out a form and sending it into the blind without being able to keep a local copy with date/time, address and a copy of my full text.
On my pages, I offer a valid mail-address and use the following JavaScript to provide a clickable mailto-address:

&lt;<script type="text/JavaScript"> var n='user'; var d='domain.tld';
document.write('<a href=\"mailto:' + n + '@' + d + '\">');</script>
user at domain.tld
<script type="text/JavaScript"> document.write('<\/a>');</script>&gt;

Users with JavaScript enabled will see a clickable <user at domain.tld> while other users see at least a <user at domain.tld>, which is not clickable but recognizable, and which I hope is not understandable for spam bots.

Regards,
R.

pageoneresults




msg:574579
 9:12 pm on Nov 13, 2002 (gmt 0)

About two months ago I started using ascii for all email addresses. I have a little tool on my site that will automatically encode the addresses, include cc, bcc, subject and body and also encode that.

From what I've read so far in this topic, it sounds like the encoding is not foolproof and that email harvesters are getting much smarter.

I've read all the replies using javascript, contact forms, etc. Is there anyone here who has a surefire way to prevent the harvesting? Is the ascii format not enough?

bill




msg:574580
 8:11 am on Nov 14, 2002 (gmt 0)

Is the ascii format not enough?

I do the same on all my sites. On most sites this has worked quite well...HOWEVER, I did this on a Chinese site that went up about a year ago and have seen no end of Chinese spam to the main contact address. I'm not sure if these guys are hand adding me to Chinese spam lists, but I'm pretty sure I've been harvested by a bot. I changed the address a few months ago and almost immediately got spam (within a few days). I can't prove it, but it seems that the ASCII trick isn't enough for the determined bots any longer.

toadhall




msg:574581
 7:03 pm on Nov 14, 2002 (gmt 0)

Disguising the link is what I concentrate on. So I encode the address, use an unrelated expression for the link text or an image if that's not possible, send it to an intermediate page for decoding with a javascript window.location to send it on and a window.close to (duh) close the window. All files have unrelated filenames. The encoding is pure 2 digit hexadecimal so there's no convenient delimiter (%) to explode it with. Now all the harvester bot has to do is follow every link on the site and hack the contents of the "to:" field. :(
Oh, well. Keeps me busy anyway.

stevenha




msg:574582
 8:55 pm on Nov 14, 2002 (gmt 0)

I've never used the mailto: method because I think it makes it a little too easy for people to send quick negative comments about my site. I believe that if people have to work just a little harder, to launch their own email app and type in the email address manually, it gives them more time to think thoughtfully about what they want to say. Even so, I get a few email comments each day. Any more would be a burden to reply to.

I used to obfuscate my email address from spam harvesters by separating the email address inside table elements. Like "firstname" in one TD../TD followed by "@domain.com" in the next TD../TD. That worked well for a couple years, but during the last year, spam is worsening anyway. I wonder if anyone here can comment about that, whether harvesters are known to be able to figure that out.

Anyway, I've changed recently to do this:
' firstname "@" domain.com ', which requires even more thought, unfortunately, and probably fools too many humans too. I'm starting to like the ".gif" idea more and more.

dingman




msg:574583
 9:49 pm on Nov 14, 2002 (gmt 0)

Every time I see someone suggest putting the e-mail address in a graphics file, I think of a blind user I used to work with. The graphic wouldn't do him or his screen-reader any good. And even if he thought to call his secretary in to look over the page for such graphics, it would be a pain in the neck. 'webmaster at this domain dot com' would be *much* more useful. If you're going to do the image thing, at least put something like that in the alt tag. No matter how much creativity you require from people to get an address out of the alt tag, it's better for the disabled than a graphic.

[OT: The first time I talked to my current boss about making web pages more friendly to the disabled, she assumed I was talking about stuck-in-a-wheelchair type disabilities, and couldn't figure out why all these things were more important to disabled users. She's on-board now that she knows what I'm talking about, though.]

jimh009




msg:574584
 7:45 am on Nov 18, 2002 (gmt 0)

Macguru - Thanks for the link to that javascript generator for email. Worked great and was simple to use. My site is new and hasn't received any email yet from spammers - hopefully now that I've replaced all the "mailto" with the new javascript, I never will.

Thanks again.

Jim

This 66 message thread spans 3 pages: 66 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved