Spam Control with hashcash

The Drupal community finally got most of the modules I wanted updated to Drupal 6, so I finally made the big upgrade. Everything was great, but none of the spam-control modules I use were updated and I soon found myself swimming in hundreds of spambot comment submissions every day.

I did some of the obvious:

exclude comment submissions that have no referrer or any referrer other than my site
exclude submissions without user agents.
check IPs against a list of proxy servers.

No effect. I needed something more. There are a few more active approaches to spam control, all of which have some drawbacks.

So What Choices Do I Have?

* Akismet and Mollom run your comment submissions through a third party, evaluate them, and put them in a separate queue for you to approve or delete as you wish.

PROS:

Generally speaking, Akismet is quite accurate and I've heard that Mollom performance is similar.
eventually automatically deletes spam comments after a user-set expiration date.

CONS:

dependent on third-party server
sorts submissions after the fact, so you still get the comments and the occasional false positive is almost certain to get deleted unless you manually scan through all your spam.
if you let spam get auto-deleted, the visitor who submitted the comment never knows his or her comments are being rejected or why.

* The drupal Spam module [drupal.org] does an admirable job of identifying spam submissions and putting them in a separate queue. I would say it works almost as well as Akismet, but runs on your own server and you control it fully. So aside from running on a third-party server, it has most of the same pros and cons as Akismet.

* CAPTCHA and reCCAPTCHA. We've recently had some extended discussions about these (automated CAPTCHA attacks [webmasterworld.com]; baked jake's CAPTCHA rant [webmasterworld.com]; CAPTCHA-cracking in India [webmasterworld.com]). Personally, I'm not a fan. I have good eyes. Not as young as they once were, but nevertheless quite good. Professionally, I make my living as a historian and paleographer and am considered a top expert in reading illegible handwriting from the sixteenth century. And yet I would say that at least 25% of the time, I fail to solve a text-image CAPTCHA. I can't imagine how hard these are for people with bad vision. And though I love the idea of reCAPTCHA (digitizing books with distributed labor), my tests resulted in lengthy delays waiting for the reCAPTCHA server to respond. I've heard it's better now, but I'm too damn lazy to monitor my own sites as I should, let alone third-party servers that my sites depend on.

* Hashcash. Haschash depends on the "proof of work" concept to verify that a human is submitting a form or sending an email. There are many potential proofs of work, but generally this approach gets its name because it uses [tech alert] hashed values of known data. A hash is basically the result of an algorithm that takes an object (text string, file, whatever) and manipulates it to generate a uniform-length number (usually in hexadecimal). This number is not necessarily unique, but it is very difficult to guess a "collision", so hashes are often used to verify that a file has not been corrupted or modified (see [hashcash.org...] for more information).

So hashcash sends some data to both the server and a hidden form field. It then performs a series of hashes server side and stores the result in a database. It then uses javascript and the "onsubmit" event to intercept the "submit" click, run the data through the same hash algorithm that was used server side, then submit that javascript-modified value. If it matches one of the values in the database, the submission gets a pass.

This is a complex operation for a spambot. It needs to correctly identify the data that needs to get hashed. Then it has to interpret the javascript file correctly and figure out which function is being used by the onsubmit event and then run that function or alternatively, guess which of the many possible hash algos or combinations of algos are beign used. So it's quite hard. Not, by any means, impossible, but no longer low-hanging fruit.

For users, on the other hand, the process is completely transparent if they have javascript enabled. Everything is done for them.

For users with javascript turned off the submission will fail. There are a number of possible options

just send an error message and ask them to turn on their JS.
degrade to a CAPTCHA that the user must solve
put those comments into an approval queue and run them through Akismet.
[your idea here]

I've used hashcash on Wordpress sites and it pretty much killed all spam. Now I have it working on Drupal with fantastic results. I just use option #1 and send a user message and say that if you want to post a comment on my site, you need to turn on javascript. How many valid users have been turned back? I have no idea, but I can live with refusing 10% of blog comments (and I think it's less than that) if it saves me 20 minutes of scanning my Akismet queue every day. At least with hashcash, visitors know instantly that their comment has been rejected and why.

Hashcash is available for Wordpress [wordpress.org] for all recent versions.
The drupal hashcash module [drupal.org] has not been upgraded to drupal 6, but I generated and uploaded to drupal.org a drupal 6 version [drupal.org] (page down to the September 28 version - hashcash-6.x-1.4alpha.zip, not the Sept. 22 version labelled hashcash.zip which is completely fubarred). It's certainly "alpha" but it seems to be working for me. For three days now, I log in to find *no* automated comment submissions in my approval queue.

What do you think?

How do you do it on your site?

Is blocking users with JS off too high a price to pay?

Are you willing to put up with the usablity issues with CAPTCHA?

Do you have a better method altogether?

Spam Control with hashcash

A better alternative to Captcha and Akismet?

ergophobe

reprint

ergophobe

JeremyL

jecasc

incrediBILL

slef

m0thman

amznVibe

Rosalind

slef

ergophobe

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week