homepage Welcome to WebmasterWorld Guest from 54.211.231.221
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

This 66 message thread spans 3 pages: < < 66 ( 1 [2] 3 > >     
Best way to stop mailto: robot spidered spam?
chiyo




msg:574555
 8:14 am on Oct 4, 2002 (gmt 0)

We have published our email addresses on our web sites for almost 10 years using simple mailto: or forms. In the past year, the amount of spam has become intolerable. We wish to reduce that to manageable levels, by changing all pages with mailto: to something not spiderable for spambots. Would would be great is a simple "search and replace!" Yep, im lazy too..

What is the best way?

1. Just unlink all email addresses, so those who are REALLY interested in emailing you have to copy or paste into their email program rather than just clicking. This seems to be a sensible wolution at first thought, very easy to implement.

2. USe js, (not sure exactly how to do that) disadvantages.. some non-js enabled set ups wont work with it, and make such users frustrated and think your page is broke!

3. Direct all hyperlinked email addresses to our mail form, or even a new form (system) where the email address appears in the "send to" field when they get there?

Which of these do you suggest, or are there any other quick and dirty solutions?

 

kapow




msg:574585
 6:32 pm on Nov 18, 2002 (gmt 0)

Macguru - I've just used that javascript generator for my site. Thankyou :)

- I get about 30 spam emails per day. I HATE SPAM. If they can no longer harvest my email address from the site how long do you suppose it will it take for my address to leave those lists?

- Anyone figured out a way to kill the Whois email harvesters? Sure I could use a different address but I still have to check mail to that address.

pageoneresults




msg:574586
 6:36 pm on Nov 18, 2002 (gmt 0)

> If they can no longer harvest my email address from the site how long do you suppose it will it take for my address to leave those lists?

As long as that email address is valid, you'll receive the spam. If you are using an alias that captures all unknown email to @yourdomain.com, then you will receive it until you remove the alias. I've always enjoyed using that feature of our email program. Problem is, the spam continues because the emails are not rejected.

kapow, I still get spam addressed to an email that I've had since 1995. I changed the name@ in 1998 (old forwarding to new). To this date, I receive email at that old address and its all spam.

Ed_Gibbon




msg:574587
 6:54 pm on Nov 18, 2002 (gmt 0)

I came up with this solution. It requires having an e-mail address especially for the purpose of receiving only e-mail related to the website.

I have the MAILTO @ me@mywebsite in my HTML code. But on the same page I have some text that gives visitors a special "code word" to place in the subject line of the message they want to send to me. Then I set up a filter in my mail system to delete all messages that do not contain the code word. So far, no junk e-mail has gotten through (and I don't care how much goes to the trash), and I still seem to get as much mail as I ever did from visitors to my site.

WebSocks




msg:574588
 3:20 am on Nov 19, 2002 (gmt 0)

Macguru- Thank you I installed it and it is working fine.
I tried to encode the email address and it worked -- for two days, now we are back to recieving lots of spam.

kapow




msg:574589
 10:28 am on Nov 19, 2002 (gmt 0)

> On the page give visitors a special "code word" to place in the subject line ... delete all messages that do not contain the code word.
- Great idea!

Someone suggested using a list of rubbish email addresses to 'spoil' the spam list. I don't think this will work because those spam bots detect and delete 'bounce backs' (non-working email addresses). They obviously handle millions of addresses so a few thousand rubbish addresses will be quickly dropped from their spam list.

pmkpmk




msg:574590
 1:09 pm on Nov 19, 2002 (gmt 0)

Funny enough nobody mentioned another thread here on webmasterworld which - in essence - deals with the same problem.

IF you have access to the server itself (at least your ISP has to allow the use of .htaccess on your virtual server) you can try to ban email harvesters altogether from reaching your site.

Have a look at this thread: [webmasterworld.com...]

The early postings are a bit trial and error, but towards the last third it gets very in depth.

The basic idea is to catch those bots (email harvesters and site downloaders) by the "User Agent" and then automatically redirect them either to an error message or to a "honeypot" which poises their database with thousands of fake addresses.

A warning however: this is not a drop-in-and-forget method! Virtually every day a new email harvester comes out, and already some of them have ways of disguising their user-agent-line. So keeping your .htaccess up to date is a constant task.

On the good side - plugging in this .htaccess-banning gets rid of maybe 80% of all spambots at once - the amount of spam to come goes down significantly.

If you're lucky enough to run your own mailserver under your own control, you can add a second line of defense: the use of realtime blacklists (somtimes also called realtime blocklist or RBL's) in your mailserver allows you to block potential spam when the spammer tries to deliver it to you. On EACH incoming email, the mail-server checks at least one of these RBL's. If the senders IP-address tests positive on this list, email delivery is instantly cancelled even BEFORE the mail-data is transferred to your server. There's a multitude of RBL's out there. Our server checks EACH incoming message against 5 different RBL's. Some of our users - including myself - post-check their messages again against other RBL's. I - for example - have all messages coming from Russia/China/Korea/Malaysia etc. tagged with the prefix "**SPAM**". This second (and third) line of defense makes life a lot esier!

pmkpmk




msg:574591
 1:14 pm on Nov 19, 2002 (gmt 0)

As for "poisoning" spammer databases with bogus email addresses: yes, sophisticated spmmer tools actually DO check the adresses and delete bounce backs.

Someone else suggested instead of using bogus or generated email adresses to use actual spammer's addresses! This person - I guess it was the author of "SugarPlum" - suggested to collect all the adresses from EACH Spam you get and use those on a spambot-page on your site.

bluelook




msg:574592
 2:47 pm on Nov 19, 2002 (gmt 0)

Try this one:

[tools7.com...]

It encodes emails and links and itīs free.

spock




msg:574593
 5:43 pm on Nov 19, 2002 (gmt 0)

If you're lucky enough to run your own mailserver under your own control...

I do run my own mailserver, and have even implemented a system where I and my customers can specify different sets of blacklists per domain and/or mail account. Believe it or not, but some people actually need to receive mail from Russia. ;)

Regarding mailto: link obfuscation - I'm using it on my pages, fully aware of the fact that the spambots would be catching on to this trick sooner or later. No JavaScript (that's too fragile), just character encoding. Is this still a worthwhile tactic, or can most of the spambots handle the encoding now?

My escalation plan (which I haven't tried yet, so I'm not even sure it would work) is to hide the mailto: behind a POST request. That should be pretty hard for spambots to penetrate. Anyone tried that? Do you think it's a good idea?

Macguru




msg:574594
 5:59 pm on Nov 19, 2002 (gmt 0)

and it worked -- for two days, now we are back to recieving lots of spam.

Once an email have been harvested once, it usually ends up in a huge database, it's too late. Encrypted emails is to prevent most spam bots to get new ones. It wont stop spammers who bought your address, along with millions of others, on a 19.95 $ CD...

bluelook




msg:574595
 6:18 pm on Nov 19, 2002 (gmt 0)

Another good tactic is to create a small image with your email on it :)
Please tell me of a spambot that can do OCR :)

Sincerely,

Nuno Oliveira

Macguru




msg:574596
 6:21 pm on Nov 19, 2002 (gmt 0)

I always thought some bad bots crawled MAILTO: links. Wow does an image with a mailo link supposed to help?

bluelook




msg:574597
 6:44 pm on Nov 19, 2002 (gmt 0)

You didnīt understand me...
The image isnīt a link :)
The image has your email wrote on it :)
Ok, the lazy visitors canīt click on the image and automatically open their Email Composer...
But the other ones can open their email program, and copy down your email address.

Bye,

Nuno Oliveira

Macguru




msg:574598
 6:48 pm on Nov 19, 2002 (gmt 0)

I think 99 % of visitors will simply think the 'link' doesn't work and shop elsewhere.

My 0.02 $.ca

bluelook




msg:574599
 9:06 pm on Nov 19, 2002 (gmt 0)

But it isnīt even a link. Itīs like having plain text telling the email address...
Geez I think my visitors are smart enough to write down an email and put it on the compose box of their email program, or webmail.
People that canīt do this, donīt know how to buy anything either, lol, isnīt that right?
They donīt know how to enter a credit card number (maybe theyīll try to click on the credit card :) ), and would never know how to sign up for a Paypal account.
And donīt forget all the people that only use webmail, and so clicking on the mailto: link only gives them "you donīt have any email program".
Those visitors/clients have to write down or copy your email address and paste it on their webmailīs compose field.
Are you loosing all those visitors? I donīt think so.

Ciao,

Nuno Oliveira

Macguru




msg:574600
 9:27 pm on Nov 19, 2002 (gmt 0)

>>Geez I think my visitors are smart enough to write down an email and put it on the compose box of their email program, or webmail.

OK, maybe 99 % was exagerated, but 90 % seems realistic. I make most efforts at getting visitors. Once they are in, (easy go) why would I force them into mistery meat gymnastics to write us?

We just want to filter spam out here. Customer enquiries shall not suffer from it.

pmkpmk




msg:574601
 8:30 am on Nov 20, 2002 (gmt 0)

Hi Mr. Spock,

sure - some users do need to receive mail from Russia and China. That's the reason why our second line of defense only blacklists mail from known spammer domains and open relays. However, my personal third line of defense is to block mail from Russia and China. Maybe our guys in international sales need to receive mail from Russia and China. But I'm pretty sure nobody there has any reason to write ME. And if so, I still have the mails which are catched in the third line of defense only tagged - not rejected.

The system as we have it implemented: first line of defense - block known SpamBots from harvesting our adresses, second line of defense - reject email from known spammer domains, and the (optional) third line of defense - sieving even harsher through the mail has brought down overall spam coverage tremendously, and has brought down my personal amount of spam from 30-40 daily to 2-3 daily.

Not bad, eh?

spock




msg:574602
 9:11 am on Nov 20, 2002 (gmt 0)

pmkpmk, it sounds like we have about the same ideas regarding appropriate spam protection. The blacklists I use for my personal account is relatively aggressive so I probably block significant parts of Russia and China indirectly. Since implementing this system I've actually been receiving less than one spam message per day (I got less than your 30-40 before that, though).

bluelook




msg:574603
 10:24 am on Nov 20, 2002 (gmt 0)

Users that canīt copy an email are more likely to use webmail, arenīt they? How can they configure Outlook Express, Eudora or Pegasus?
I never really thought about this subject. Maybe because I donīt have the habit of clicking on mailto: links. But MacGuru, thatīs your opinion and I respect it :) Because my only targets are webmasters and advertisers, I donīt worry much about this subject. I think they are smart enough to do that, or it would be difficult to work with them (and have to teach everything). Maybe because of this step, my customer support is almost 0. Everybody buys the product and donīt have any problem on doing that. Maybe I can loose a client or 2, but I donīt have any work either.

Cyas,

Nuno Oliveira

George




msg:574604
 8:39 pm on Nov 20, 2002 (gmt 0)

If you are spammed out and do not have access to your server, try mailwasher, a superb tool.

After replacing mailto: with a simple java...no complaints yet :) and then using mailwasher, I reduced my inbox by about 40 emails per day...

George

JamesR




msg:574605
 12:14 am on Nov 21, 2002 (gmt 0)

I haven't gotten any spam through a form and formmail script on one domain.

rjohara




msg:574606
 3:40 am on Nov 21, 2002 (gmt 0)

On a somewhat related topic: I have a number of pages that mention people's personal names, and sometimes have pictures. ("Here's a picture of Mary Jones at the office party.") People like to have these available, and they certainly aren't meant to be secret in any way, but I have wondered for some time if encoding the names would give a measure of privacy from random web searches for a person's name. If I encode "Mary Jones" as a string of character references most browsers will still read it with no problem, but will that hide it from Google et al.? (Not that there are many et al's left on the net.)

It would be a simple experiment; I've just never tried it. Anyone else? As far as I know there is no regular way of hiding a fragment of a page from an honest search engine, but that would be a great feature: <noindex>Mary Jones</noindex>. Perhaps Google would care to implement this?

pageoneresults




msg:574607
 5:09 am on Nov 21, 2002 (gmt 0)

rjohara, actually there is a company that invented this tag for their own crawler based SE. Its formatted exactly as you show it above...

<noindex>No Index Content Here</noindex>

I knew I remembered seeing something about that somewhere.

You know, if I could eliminate all the incoming spam, I may be left with nothing to read! ;)

nealw




msg:574608
 8:17 pm on Nov 21, 2002 (gmt 0)

I use this little java script. Very little spam:

<SCRIPT TYPE="text/javascript">
<!--
document.write("<A HREF=" + "ma" + "il" + "to" + ":YOURNAME" + "@" + "YOURDOMAIN" + ".com" + ">" + "YOURNAME@YOURDOMAIN.com" + "</A>")
//-->
</SCRIPT>

g1smd




msg:574609
 9:11 pm on Nov 21, 2002 (gmt 0)

>> document.write("<A HREF=" + "ma" + "il" + "to" + ":YOURNAME" + "@" + "YOURDOMAIN" + ".com" + "YOURNAME@YOURDOMAIN.com" + "</A>") <<

Is that highlighted bit really still invisible to robots?

ScottM




msg:574610
 9:42 pm on Nov 21, 2002 (gmt 0)

How about this:

document.write("<A HREF=" + "ma" + "il" + "to" + ":YOURNAME" + "@" + "YOURDOMAIN" + ".com" + "Email Me:" + "</A>") <<

Hagstrom




msg:574611
 10:53 am on Nov 22, 2002 (gmt 0)

<noindex>Mary Jones</noindex>. Perhaps Google would care to implement this?

The <noindex> tag is user by the Atomz search.

We had a similar thread about this.
One solution was the soft hyphen: Ma&shy;ry Jo&shy;nes
Unfortunately the soft hyphen shows on NetScape.

Another solution was to use very small punctuation: Ma<font size="1">.</font>ry Jo<font size="1">.</font>nes

Hagstrom




msg:574612
 11:31 am on Nov 22, 2002 (gmt 0)

AFter a good lunch I just found the perfect solution - using the <noindex> tag: Ma<noindex>ry Jo<noindex>nes

You may of course replace <noindex> with any invalid HTML tag :)

jimbeetle




msg:574613
 9:43 pm on Nov 23, 2002 (gmt 0)

I've used the encoder at Willmaster. Real simple and handles cc, bcc, subject and body if you want:

[willmaster.com...]

I do get a good number of 404 errors across four different sites. Can't associate the 404s with browsers but do know that on machines I've used the encoded address works on IE 4 thru 6 and NS 4 thru 6. But the number of 404s is many more than can be associated with visitors using something like Opera. Might be spiders, might be 'transient' browser errors or some 'undocumented feature.' Think spiders most likely.

And, duh! On one site that I use Frontpage for I copied the encoded address, slapped it into the footer, saved it, published the 1,500 pages and thought all was okay. Guess what? Just viewed source and there's mailto:blahblahblah sitting there plain as day.

Re-encoded and repasted it into the footer and this time I noticed that FP does its?helpful? bit by converting all of it right back to plaintext. If you use FP be sure to use "Insert...Advanced...HTML."

Jim

Bernie




msg:574614
 12:36 pm on Nov 24, 2002 (gmt 0)

i know it is not in the spirit of marketing but since spam has increased terrifficly i tend to show my email-address only as a .gif-image (with a brief explanation why). javascript solutions mentioned here are interesting but who knows what a spam-bot may interpret in the future. :-)

poeple can use the web-forms and if they neither handle the web-form nor the typing in of an email-address they can call me!

bill




msg:574615
 4:20 am on Nov 26, 2002 (gmt 0)

kapow msg #:35
Someone suggested using a list of rubbish email addresses to 'spoil' the spam list. I don't think this will work because those spam bots detect and delete 'bounce backs' (non-working email addresses).

pmkpmk msg #:37
As for "poisoning" spammer databases with bogus email addresses: yes, sophisticated spmmer tools actually DO check the adresses and delete bounce backs.

Sounds like a huge endorsement for MailWasher which will bounce messages back to the sophisticated spammers. I'd be interested to know whether mail poisoning honey-pots have any effect at all on corrupting these guy's data. It certainly sounds like a good idea.

This 66 message thread spans 3 pages: < < 66 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved