homepage Welcome to WebmasterWorld Guest from 54.211.235.255
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
Spam Control with hashcash
A better alternative to Captcha and Akismet?
ergophobe




msg:3754617
 4:34 pm on Sep 29, 2008 (gmt 0)

The Drupal community finally got most of the modules I wanted updated to Drupal 6, so I finally made the big upgrade. Everything was great, but none of the spam-control modules I use were updated and I soon found myself swimming in hundreds of spambot comment submissions every day.

I did some of the obvious:

  • exclude comment submissions that have no referrer or any referrer other than my site
  • exclude submissions without user agents.
  • check IPs against a list of proxy servers.

No effect. I needed something more. There are a few more active approaches to spam control, all of which have some drawbacks.

So What Choices Do I Have?

* Akismet and Mollom run your comment submissions through a third party, evaluate them, and put them in a separate queue for you to approve or delete as you wish.

PROS:

  • Generally speaking, Akismet is quite accurate and I've heard that Mollom performance is similar.
  • eventually automatically deletes spam comments after a user-set expiration date.

CONS:

  • dependent on third-party server
  • sorts submissions after the fact, so you still get the comments and the occasional false positive is almost certain to get deleted unless you manually scan through all your spam.
  • if you let spam get auto-deleted, the visitor who submitted the comment never knows his or her comments are being rejected or why.

* The drupal Spam module [drupal.org] does an admirable job of identifying spam submissions and putting them in a separate queue. I would say it works almost as well as Akismet, but runs on your own server and you control it fully. So aside from running on a third-party server, it has most of the same pros and cons as Akismet.

* CAPTCHA and reCCAPTCHA. We've recently had some extended discussions about these (automated CAPTCHA attacks [webmasterworld.com]; baked jake's CAPTCHA rant [webmasterworld.com]; CAPTCHA-cracking in India [webmasterworld.com]). Personally, I'm not a fan. I have good eyes. Not as young as they once were, but nevertheless quite good. Professionally, I make my living as a historian and paleographer and am considered a top expert in reading illegible handwriting from the sixteenth century. And yet I would say that at least 25% of the time, I fail to solve a text-image CAPTCHA. I can't imagine how hard these are for people with bad vision. And though I love the idea of reCAPTCHA (digitizing books with distributed labor), my tests resulted in lengthy delays waiting for the reCAPTCHA server to respond. I've heard it's better now, but I'm too damn lazy to monitor my own sites as I should, let alone third-party servers that my sites depend on.

* Hashcash. Haschash depends on the "proof of work" concept to verify that a human is submitting a form or sending an email. There are many potential proofs of work, but generally this approach gets its name because it uses [tech alert] hashed values of known data. A hash is basically the result of an algorithm that takes an object (text string, file, whatever) and manipulates it to generate a uniform-length number (usually in hexadecimal). This number is not necessarily unique, but it is very difficult to guess a "collision", so hashes are often used to verify that a file has not been corrupted or modified (see [hashcash.org...] for more information).

So hashcash sends some data to both the server and a hidden form field. It then performs a series of hashes server side and stores the result in a database. It then uses javascript and the "onsubmit" event to intercept the "submit" click, run the data through the same hash algorithm that was used server side, then submit that javascript-modified value. If it matches one of the values in the database, the submission gets a pass.

This is a complex operation for a spambot. It needs to correctly identify the data that needs to get hashed. Then it has to interpret the javascript file correctly and figure out which function is being used by the onsubmit event and then run that function or alternatively, guess which of the many possible hash algos or combinations of algos are beign used. So it's quite hard. Not, by any means, impossible, but no longer low-hanging fruit.

For users, on the other hand, the process is completely transparent if they have javascript enabled. Everything is done for them.

For users with javascript turned off the submission will fail. There are a number of possible options

  1. just send an error message and ask them to turn on their JS.
  2. degrade to a CAPTCHA that the user must solve
  3. put those comments into an approval queue and run them through Akismet.
  4. [your idea here]

I've used hashcash on Wordpress sites and it pretty much killed all spam. Now I have it working on Drupal with fantastic results. I just use option #1 and send a user message and say that if you want to post a comment on my site, you need to turn on javascript. How many valid users have been turned back? I have no idea, but I can live with refusing 10% of blog comments (and I think it's less than that) if it saves me 20 minutes of scanning my Akismet queue every day. At least with hashcash, visitors know instantly that their comment has been rejected and why.

Hashcash is available for Wordpress [wordpress.org] for all recent versions.
The drupal hashcash module [drupal.org] has not been upgraded to drupal 6, but I generated and uploaded to drupal.org a drupal 6 version [drupal.org] (page down to the September 28 version - hashcash-6.x-1.4alpha.zip, not the Sept. 22 version labelled hashcash.zip which is completely fubarred). It's certainly "alpha" but it seems to be working for me. For three days now, I log in to find *no* automated comment submissions in my approval queue.

What do you think?

How do you do it on your site?

Is blocking users with JS off too high a price to pay?

Are you willing to put up with the usablity issues with CAPTCHA?

Do you have a better method altogether?

 

reprint




msg:3756606
 1:59 pm on Oct 1, 2008 (gmt 0)

I am using akismet and havent had a problem with deleted legitimate comments so far. hashcash sounds interesting. I am concerned about JS being turned off and if someone will turn it on to make a comment. They would have to be pretty motivated to make a comment. What about cell phones? I have a couple of sites that visitors use cellphones to make comments. Will hashcash work for them?

As for captcha, will never use it. too much work for the user and puts them off and for the reasons you gave about readability

ergophobe




msg:3756701
 4:06 pm on Oct 1, 2008 (gmt 0)

I am concerned about JS being turned off and if someone will turn it on to make a comment.

That's the big drawback to haschash. Being that I was drowning in spam, I was willing to pay that price.

The cell phone question is an interesting one. How many people have non-JS-enabled browsers on their cell phones? Do you have any idea?

The best stats I could find say that about half of the mobile visitors have JS-enabled browsers (that's from oct 2007). So I guess it depends on how many you get. I don't get many, but the nubmer is growing. It's growing, however, because of iPhone and others with powerful browsers.

You could make hashcash degrade nicely by demanding some other proof of work that requires user interaction, but if Javascript is available, that field gets hidden and automatically filled in. Or you could treat hashcash success as a free pass, while failure just flags a post for review.

I suppose it depends on your user profile and how hard you're being hit by bots. If you dont' have much spam and you have a lot of users without JS, then Akismet or similar is probably less work in the end.

One other option that I've seen people use is a hidden form field. If the user has a CSS-enabled browser, the field doesn't show. If there's a problem, the field says something like "Do not fill in this field unless you are a spammer". If the form is submitted with a value in that field it gets treated as a bot submission. I have no idea if it works or not. Do spambots automatically fill in every field on a form? Not sure they do.

I agree Akismet is pretty good and I still have it as a second layer of defense on one site, but the spam queue is largely empty after installing hashcash. My issue is that there are occasional false positives that get flagged for review. If you don't have a lot of spam, you can just review these and approve them. For me there are two issues though
- comments that get flagged as spam or flagged for review sit in the queue and if you aren't checking your queue frequently, users might wonder where their comment is.

- if you have tons of spam it's just too time-consuming to go through your spam logs every time and so I tend to just "delete all". The other day I was on a slow dialup and purged my spam queue (drupal spam module, not Akismet, but it also rarely has false positives). Given the slow connection, after I pushed the "delete all" button and as I was waiting for the system to respond I was looking at the screen and noticed a legitimate comment and it was too late. Who knows how many of those I've deleted? Probably not many. In this case, I had time to make a mental note of the subject and sender and that person had also sent an email through my contact form, so all was good. But I just find it onerous to check my spam logs.

JeremyL




msg:3757168
 4:08 am on Oct 2, 2008 (gmt 0)

I need to try that out. I still need mollom or akismet to cancel out these manual blog spammer who post links to BS. I have tried Mollom but have a complaint, when it blocks something there is no way to go back and tell it that it's wrong. The comment is just gone. There needs to be a que for blocked spam that I can scan to undo any false positives.

jecasc




msg:3757226
 7:32 am on Oct 2, 2008 (gmt 0)

I simply randomize the input field names and store them in sessions.

For example:

$_SESSION['comment'] = sha1(uniqid (rand()));
echo '<input type="text" name="'.$_SESSION['comment'].'">';

and then retrieve them like this: $comment= $_POST[$_SESSION['comment']];

Since this technique is not very often used by others it works for me. Of course it only works against bots and not human spammers. All in all, I prefer individual ways to combat spam bots, since nobody will develop a bot solution just for my website. Off the shelf solutions on the other hand are always a target for spammers since they are widely used.

incrediBILL




msg:3757236
 8:02 am on Oct 2, 2008 (gmt 0)

Javascript is the true captcha.

All the other junk (askimet, etc.) is overly complicated, false positives, just horrific.

You tell users that hit those pages without javascript that they need it enabled to submit anything. Bots don't run javascript and if your code creates a javascript signature for anything manually typed into the fields, it's pretty much a 100% solution when the server side script compares the javascript signature.

This type of solution has to be randomized and obfuscated so the spam scripts can't detect and create the signature which isn't hard to make it virtually untraceable.

[edited by: incrediBILL at 8:03 am (utc) on Oct. 2, 2008]

slef




msg:3757269
 8:43 am on Oct 2, 2008 (gmt 0)

How do you do it on your site? - a mixture of blacklists, preview-required, adaptive filtering (delaying people who are trying to post too fast), external testing (akismet-like, but I'll look at the one mentioned above) and a pre-moderation cooperative (a number of webmasters who do bulk moderation across all our sites). I also make our anti-spam policy public on most sites, which deters spammers - if you don't mention it, you look clueless (=attractive, or at least worth a test spamming).

Is blocking users with JS off too high a price to pay? - Yes! Javascript is a waste of client power (= higher CO2) and IIRC there are javascript-based exploits for all released browsers, some of which which aren't public or fixed yet. Javascript also powers many crimes against usability, like moving page elements around without user control.

Are you willing to put up with the usablity issues with CAPTCHA? - No. CAPTCHA doesn't test for spam itself, so it's useless against human or semi-human spamming. Even its inventors say that the usability problems are unsolved. Why put resources into annoying your users *and* failing to test for spam?

Do you have a better method altogether? - see above. The field name randomising is pretty useful too. [added note: it usually works in PHP even without cookie support, because PHP will add a session ID to the URL query string if you use PHP-based link functions]

Hope that helps.

m0thman




msg:3757308
 10:53 am on Oct 2, 2008 (gmt 0)

Very interesting post which I shall chew over at my leisure and perhaps learn something. For now, I'll describe my methods of keeping the spam down.

I don't use captcha's (personally find them irritating) and I like to have control over "my stuff" which is why I don't use a third party to vet my comments. I also don't rely on people using javascript, although for sure most people have it turned on, but I like to keep things simple. Now I don't get tons of forum traffic, perhaps no more than 50-60 a day, sometimes up to a 100 which I guess you could say is pretty small fry really so why bother. The methods I use seem to work for me.

I first of all apply the comment against a list of blocked IP addresses that I maintain, these include IP ranges (occasionally one has to block AOL until the children get bored and move on to somewhere else). I keep the list small and after a while lift the ban. That's all automated and for me is a single click affair.

Okay, if they're allowed to post then next gets run through a filter. I have a strict no swearing, SMS text style writing, all capitals etc. policy. I also don't hyperlink web addresses (perhaps that's the most successful thing of all!). The filter also makes sure the post is within minimum and maximum length. Two words isn't really worthwhile in my opinion and wafflers need to learn to get to the point. I also force the use of the capital "I" in the sentence. I know this all sounds a bit tedious, but it weeds out those who have something worthwhile to say from those who are lazy and just there to abuse the forum.

If it passes the IP check and the filter then it gets posted, but it doesn't end there. I moderate using my own purpose built tool that quickly shows me all the comments and allows me to kill posts with a single click, pull out everything from a single IP and then delete and ban.

Perhaps that last bit sounds like too much work, but I spend ten to fifteen minutes a day moderating, some of that actually reading the stuff anyway. I guess if things get busier I may have to look into other methods (captcha's etc.) but for now it works and isn't too hard to look after...

amznVibe




msg:3757371
 12:53 pm on Oct 2, 2008 (gmt 0)

Akismet has a good record with few to none false positives but it also doesn't catch everything.

The most effective "captcha" I have ever seen is simple math questions.

Rosalind




msg:3757413
 2:02 pm on Oct 2, 2008 (gmt 0)

I use trivia questions. The advantage is it's readable and effective, but again you are asking people to do something extra, which may put off some.

Avoid maths questions like the plague, however: they're becoming too common. It's important that we all vary our methods.

slef




msg:3758099
 1:23 pm on Oct 3, 2008 (gmt 0)

Third-party checkers do have some false positives, but most implementations (like the wordpress Typepad plugin) put them in a bucket you can check and retrieve non-spam ones from, which is less work than deleting all the spam from.

I don't mind trivia questions as long as they're really trivial, presented up front and don't depend on cookies or javascript.

What really irritates are sites that let you spend all your time writing a comment, then without warning put up an eyetest I can't pass or require cookies/javascript/other-restricted-resource and blank my comment if I enable it.

ergophobe




msg:3758770
 8:06 pm on Oct 4, 2008 (gmt 0)

>> then without warning

Yeah, that's a problem and incredibly aggravating. In the past I annoatate the comment page with a "you must enable javascript to post" notice, but I think I forgot on this last one. Good heads up.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved