Forum Moderators: open
UAs allowed, a slightly longer list, but still very limited. No match: 403
Does the whitelisting have a way to keep up with all variety of mobile browsers that are in use? How can you discern unlabeled bots? There are far more only detected by their activities all the time. Lots of these use badly configured UAs, but they are catching on.
I'd like to have enough faith in whitelisting to use it, but find so much (junk) in access logs.
The other point is: block all server farms. Those are the sources for many bots, good and bad. You can only block botnet access by serious attention to means.
I'm still interested to talk more about this.
Whitelisting is only part of the non-blacklisting story for me.
My approach is:
1. separate the bots from the browsers using several standalone software filters
2. allow the browsers through
3. stop all bots except those on a short whitelist
I think you can do a certain amount of separating and whitelisting using just .htaccess pattern matching, but I don't think it's enough in general to be satisfactory unless your audience is unusually easily identifiable.
.htaccess may be more expressive than I'm aware of, but I don't really use it all, so I'm no expert.
Is there a thread somewhere here about bot-blocking software?
I rolled my own, but I realise that many people are not in a position to do that.
leaps and bounds above the rest of us.
1. separate the bots from the browsers using several standalone software filters
Aargh, don't talk to me about Regular Expressions. I just discovered that one of my sites was down for 12 hours because when I edited out a no-longer-needed |\.pdf from a pattern, I inadvertently deleted the following close-parenthesis as well.
1. separate the bots from the browsers using several standalone software filters
What standalone software?
I definitely agree with iBill - it seems mad to have everyone building their own lists.
It's not possible to rely soley on browscap, even if server farms are excluded.
7. Member of server farm, or other distinguished ASN
I don't think this is a separate category. People don't block server farms on those grounds alone; they block them because it's a simple means of excluding robots.
allow the browsers through