Welcome to WebmasterWorld Guest from 50.19.156.133

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Simple user agent checks

Fast checks for user agents

     

btherl

11:41 pm on Jul 17, 2012 (gmt 0)

5+ Year Member



Hi,

I am looking at good, simple heuristics to check if a user agent is "believable". For example, I want a user agent like "RAV1.23" to be rejected, but a normal one like "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" to be accepted. I'm thinking about these rules

1. If the user agent doesn't contain "/", reject it.
2. If the user agent doesn't start with a letter, reject it (this catches some bizarre ones)
3. If the user agent doesn't contain at least one space, reject it (This seems like a bad idea, eg "NokiaE66/UCWEB8.5.0.163/28/800" looks legit but has no space)
4. If user agent is 10 characters or less, reject it (This allows "Mozilla/4.0" but nothing shorter)

Will these rules reject any legitimate user agents? What I'm basically looking for is rules that will detect unknown, suspicious user agents while not rejected any legitimate ones.

Thanks.

tangor

9:49 am on Jul 18, 2012 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Your rules might be different than others... There are few hard and fast rules (other than banning known server farms by IP... and even then some might disagree).

You do for your site what is best for your site.

That said, any deny is a potential deny to a real live human being (potential! is! key! word!). Allow all, or allow only what you want (called white listing) where you only allow a specific set and reject the rest.

Whatever makes it easier to sleep at night is what you will do. Chasing bad actors and compiling a huge list is a lot of work... much easier to say who can come in the door than to say let everybody in Except this one, that one, that other one, that one other there, whoops, he's a friend of that one, and by golly there's another one on the same block at that first one... Black listing will makes you nuts.

Looking at your 1, 2, 3, 4 above, that's close to white listing. Investigate that concept. See if it fits with your intended audience...

motorhaven

6:09 pm on Jul 18, 2012 (gmt 0)

10+ Year Member



Another factor in white-listing's favor is properly implemented it tends to have much less overhead.

I whitelist and then have a very small blacklist for those which manage to pass through the whitelist ruleset.

lucy24

11:05 pm on Jul 18, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



3. If the user agent doesn't contain at least one space, reject it

I've seen more of the opposite: Robotic UAs that contain multi-spaces like "   " * in the middle. Can't remember a real human in that form.


* In the Forums, as in HTML, you keep the extra space from being eaten by judicious   use   of   nonbreaking   spaces ;)

btherl

12:57 am on Jul 19, 2012 (gmt 0)

5+ Year Member



Thanks for the feedback everyone :) lucy24 are you talking about multiple spaces? Yes that does sound suspicious..

The reason I'm avoiding whitelisting is mobile user agents - there seems to be a lot of them. We may end up doing it though.. there's a lot of PHP libraries available that identify mobile UAs and we could use those.

I definitely want to avoid explicit blacklisting, because they multiply every day.

incrediBILL

6:14 pm on Jul 20, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The reason I'm avoiding whitelisting is mobile user agents


That's no excuse not to whitelist as there are way more bad bots to blacklist.

There are some pretty simple ways of detecting mobile agents using a combination of things in the user agent and header that IDs most of them and some simple PHP scripts that already do it.

Worse case, just use browsecap.ini to look them up.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month