Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: ergophobe
I did some of the obvious:
No effect. I needed something more. There are a few more active approaches to spam control, all of which have some drawbacks.
So What Choices Do I Have?
* Akismet and Mollom run your comment submissions through a third party, evaluate them, and put them in a separate queue for you to approve or delete as you wish.
* The drupal Spam module [drupal.org] does an admirable job of identifying spam submissions and putting them in a separate queue. I would say it works almost as well as Akismet, but runs on your own server and you control it fully. So aside from running on a third-party server, it has most of the same pros and cons as Akismet.
* CAPTCHA and reCCAPTCHA. We've recently had some extended discussions about these (automated CAPTCHA attacks [webmasterworld.com]; baked jake's CAPTCHA rant [webmasterworld.com]; CAPTCHA-cracking in India [webmasterworld.com]). Personally, I'm not a fan. I have good eyes. Not as young as they once were, but nevertheless quite good. Professionally, I make my living as a historian and paleographer and am considered a top expert in reading illegible handwriting from the sixteenth century. And yet I would say that at least 25% of the time, I fail to solve a text-image CAPTCHA. I can't imagine how hard these are for people with bad vision. And though I love the idea of reCAPTCHA (digitizing books with distributed labor), my tests resulted in lengthy delays waiting for the reCAPTCHA server to respond. I've heard it's better now, but I'm too damn lazy to monitor my own sites as I should, let alone third-party servers that my sites depend on.
* Hashcash. Haschash depends on the "proof of work" concept to verify that a human is submitting a form or sending an email. There are many potential proofs of work, but generally this approach gets its name because it uses [tech alert] hashed values of known data. A hash is basically the result of an algorithm that takes an object (text string, file, whatever) and manipulates it to generate a uniform-length number (usually in hexadecimal). This number is not necessarily unique, but it is very difficult to guess a "collision", so hashes are often used to verify that a file has not been corrupted or modified (see [hashcash.org...] for more information).
Hashcash is available for Wordpress [wordpress.org] for all recent versions.
The drupal hashcash module [drupal.org] has not been upgraded to drupal 6, but I generated and uploaded to drupal.org a drupal 6 version [drupal.org] (page down to the September 28 version - hashcash-6.x-1.4alpha.zip, not the Sept. 22 version labelled hashcash.zip which is completely fubarred). It's certainly "alpha" but it seems to be working for me. For three days now, I log in to find *no* automated comment submissions in my approval queue.
What do you think?
How do you do it on your site?
Is blocking users with JS off too high a price to pay?
Are you willing to put up with the usablity issues with CAPTCHA?
Do you have a better method altogether?
As for captcha, will never use it. too much work for the user and puts them off and for the reasons you gave about readability
I am concerned about JS being turned off and if someone will turn it on to make a comment.
That's the big drawback to haschash. Being that I was drowning in spam, I was willing to pay that price.
The cell phone question is an interesting one. How many people have non-JS-enabled browsers on their cell phones? Do you have any idea?
The best stats I could find say that about half of the mobile visitors have JS-enabled browsers (that's from oct 2007). So I guess it depends on how many you get. I don't get many, but the nubmer is growing. It's growing, however, because of iPhone and others with powerful browsers.
I suppose it depends on your user profile and how hard you're being hit by bots. If you dont' have much spam and you have a lot of users without JS, then Akismet or similar is probably less work in the end.
One other option that I've seen people use is a hidden form field. If the user has a CSS-enabled browser, the field doesn't show. If there's a problem, the field says something like "Do not fill in this field unless you are a spammer". If the form is submitted with a value in that field it gets treated as a bot submission. I have no idea if it works or not. Do spambots automatically fill in every field on a form? Not sure they do.
I agree Akismet is pretty good and I still have it as a second layer of defense on one site, but the spam queue is largely empty after installing hashcash. My issue is that there are occasional false positives that get flagged for review. If you don't have a lot of spam, you can just review these and approve them. For me there are two issues though
- comments that get flagged as spam or flagged for review sit in the queue and if you aren't checking your queue frequently, users might wonder where their comment is.
- if you have tons of spam it's just too time-consuming to go through your spam logs every time and so I tend to just "delete all". The other day I was on a slow dialup and purged my spam queue (drupal spam module, not Akismet, but it also rarely has false positives). Given the slow connection, after I pushed the "delete all" button and as I was waiting for the system to respond I was looking at the screen and noticed a legitimate comment and it was too late. Who knows how many of those I've deleted? Probably not many. In this case, I had time to make a mental note of the subject and sender and that person had also sent an email through my contact form, so all was good. But I just find it onerous to check my spam logs.
$_SESSION['comment'] = sha1(uniqid (rand()));
echo '<input type="text" name="'.$_SESSION['comment'].'">';
and then retrieve them like this: $comment= $_POST[$_SESSION['comment']];
Since this technique is not very often used by others it works for me. Of course it only works against bots and not human spammers. All in all, I prefer individual ways to combat spam bots, since nobody will develop a bot solution just for my website. Off the shelf solutions on the other hand are always a target for spammers since they are widely used.
All the other junk (askimet, etc.) is overly complicated, false positives, just horrific.
This type of solution has to be randomized and obfuscated so the spam scripts can't detect and create the signature which isn't hard to make it virtually untraceable.
[edited by: incrediBILL at 8:03 am (utc) on Oct. 2, 2008]
Are you willing to put up with the usablity issues with CAPTCHA? - No. CAPTCHA doesn't test for spam itself, so it's useless against human or semi-human spamming. Even its inventors say that the usability problems are unsolved. Why put resources into annoying your users *and* failing to test for spam?
Do you have a better method altogether? - see above. The field name randomising is pretty useful too. [added note: it usually works in PHP even without cookie support, because PHP will add a session ID to the URL query string if you use PHP-based link functions]
Hope that helps.
I first of all apply the comment against a list of blocked IP addresses that I maintain, these include IP ranges (occasionally one has to block AOL until the children get bored and move on to somewhere else). I keep the list small and after a while lift the ban. That's all automated and for me is a single click affair.
Okay, if they're allowed to post then next gets run through a filter. I have a strict no swearing, SMS text style writing, all capitals etc. policy. I also don't hyperlink web addresses (perhaps that's the most successful thing of all!). The filter also makes sure the post is within minimum and maximum length. Two words isn't really worthwhile in my opinion and wafflers need to learn to get to the point. I also force the use of the capital "I" in the sentence. I know this all sounds a bit tedious, but it weeds out those who have something worthwhile to say from those who are lazy and just there to abuse the forum.
If it passes the IP check and the filter then it gets posted, but it doesn't end there. I moderate using my own purpose built tool that quickly shows me all the comments and allows me to kill posts with a single click, pull out everything from a single IP and then delete and ban.
Perhaps that last bit sounds like too much work, but I spend ten to fifteen minutes a day moderating, some of that actually reading the stuff anyway. I guess if things get busier I may have to look into other methods (captcha's etc.) but for now it works and isn't too hard to look after...