Page is a not externally linkable
ergophobe - 4:34 pm on Sep 29, 2008 (gmt 0)
I did some of the obvious:
The Drupal community finally got most of the modules I wanted updated to Drupal 6, so I finally made the big upgrade. Everything was great, but none of the spam-control modules I use were updated and I soon found myself swimming in hundreds of spambot comment submissions every day.
No effect. I needed something more. There are a few more active approaches to spam control, all of which have some drawbacks.
So What Choices Do I Have?
* Akismet and Mollom run your comment submissions through a third party, evaluate them, and put them in a separate queue for you to approve or delete as you wish.
PROS:
CONS:
* The drupal Spam module [drupal.org] does an admirable job of identifying spam submissions and putting them in a separate queue. I would say it works almost as well as Akismet, but runs on your own server and you control it fully. So aside from running on a third-party server, it has most of the same pros and cons as Akismet.
* CAPTCHA and reCCAPTCHA. We've recently had some extended discussions about these (automated CAPTCHA attacks [webmasterworld.com]; baked jake's CAPTCHA rant [webmasterworld.com]; CAPTCHA-cracking in India [webmasterworld.com]). Personally, I'm not a fan. I have good eyes. Not as young as they once were, but nevertheless quite good. Professionally, I make my living as a historian and paleographer and am considered a top expert in reading illegible handwriting from the sixteenth century. And yet I would say that at least 25% of the time, I fail to solve a text-image CAPTCHA. I can't imagine how hard these are for people with bad vision. And though I love the idea of reCAPTCHA (digitizing books with distributed labor), my tests resulted in lengthy delays waiting for the reCAPTCHA server to respond. I've heard it's better now, but I'm too damn lazy to monitor my own sites as I should, let alone third-party servers that my sites depend on.
* Hashcash. Haschash depends on the "proof of work" concept to verify that a human is submitting a form or sending an email. There are many potential proofs of work, but generally this approach gets its name because it uses [tech alert] hashed values of known data. A hash is basically the result of an algorithm that takes an object (text string, file, whatever) and manipulates it to generate a uniform-length number (usually in hexadecimal). This number is not necessarily unique, but it is very difficult to guess a "collision", so hashes are often used to verify that a file has not been corrupted or modified (see [hashcash.org...] for more information).
So hashcash sends some data to both the server and a hidden form field. It then performs a series of hashes server side and stores the result in a database. It then uses javascript and the "onsubmit" event to intercept the "submit" click, run the data through the same hash algorithm that was used server side, then submit that javascript-modified value. If it matches one of the values in the database, the submission gets a pass.
This is a complex operation for a spambot. It needs to correctly identify the data that needs to get hashed. Then it has to interpret the javascript file correctly and figure out which function is being used by the onsubmit event and then run that function or alternatively, guess which of the many possible hash algos or combinations of algos are beign used. So it's quite hard. Not, by any means, impossible, but no longer low-hanging fruit.
For users, on the other hand, the process is completely transparent if they have javascript enabled. Everything is done for them.
For users with javascript turned off the submission will fail. There are a number of possible options
I've used hashcash on Wordpress sites and it pretty much killed all spam. Now I have it working on Drupal with fantastic results. I just use option #1 and send a user message and say that if you want to post a comment on my site, you need to turn on javascript. How many valid users have been turned back? I have no idea, but I can live with refusing 10% of blog comments (and I think it's less than that) if it saves me 20 minutes of scanning my Akismet queue every day. At least with hashcash, visitors know instantly that their comment has been rejected and why.
Hashcash is available for Wordpress [wordpress.org] for all recent versions.
The drupal hashcash module [drupal.org] has not been upgraded to drupal 6, but I generated and uploaded to drupal.org a drupal 6 version [drupal.org] (page down to the September 28 version - hashcash-6.x-1.4alpha.zip, not the Sept. 22 version labelled hashcash.zip which is completely fubarred). It's certainly "alpha" but it seems to be working for me. For three days now, I log in to find *no* automated comment submissions in my approval queue.
What do you think?
How do you do it on your site?
Is blocking users with JS off too high a price to pay?
Are you willing to put up with the usablity issues with CAPTCHA?
Do you have a better method altogether?