Welcome to WebmasterWorld Guest from 54.147.232.40

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Simple user agent checks

Fast checks for user agents

     
11:41 pm on Jul 17, 2012 (gmt 0)

New User

10+ Year Member

joined:Nov 24, 2005
posts: 20
votes: 0


Hi,

I am looking at good, simple heuristics to check if a user agent is "believable". For example, I want a user agent like "RAV1.23" to be rejected, but a normal one like "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" to be accepted. I'm thinking about these rules

1. If the user agent doesn't contain "/", reject it.
2. If the user agent doesn't start with a letter, reject it (this catches some bizarre ones)
3. If the user agent doesn't contain at least one space, reject it (This seems like a bad idea, eg "NokiaE66/UCWEB8.5.0.163/28/800" looks legit but has no space)
4. If user agent is 10 characters or less, reject it (This allows "Mozilla/4.0" but nothing shorter)

Will these rules reject any legitimate user agents? What I'm basically looking for is rules that will detect unknown, suspicious user agents while not rejected any legitimate ones.

Thanks.
9:49 am on July 18, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7197
votes: 450


Your rules might be different than others... There are few hard and fast rules (other than banning known server farms by IP... and even then some might disagree).

You do for your site what is best for your site.

That said, any deny is a potential deny to a real live human being (potential! is! key! word!). Allow all, or allow only what you want (called white listing) where you only allow a specific set and reject the rest.

Whatever makes it easier to sleep at night is what you will do. Chasing bad actors and compiling a huge list is a lot of work... much easier to say who can come in the door than to say let everybody in Except this one, that one, that other one, that one other there, whoops, he's a friend of that one, and by golly there's another one on the same block at that first one... Black listing will makes you nuts.

Looking at your 1, 2, 3, 4 above, that's close to white listing. Investigate that concept. See if it fits with your intended audience...
6:09 pm on July 18, 2012 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 10, 2004
posts: 408
votes: 14


Another factor in white-listing's favor is properly implemented it tends to have much less overhead.

I whitelist and then have a very small blacklist for those which manage to pass through the whitelist ruleset.
11:05 pm on July 18, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13531
votes: 403


3. If the user agent doesn't contain at least one space, reject it

I've seen more of the opposite: Robotic UAs that contain multi-spaces like "   " * in the middle. Can't remember a real human in that form.


* In the Forums, as in HTML, you keep the extra space from being eaten by judicious   use   of   nonbreaking   spaces ;)
12:57 am on July 19, 2012 (gmt 0)

New User

10+ Year Member

joined:Nov 24, 2005
posts: 20
votes: 0


Thanks for the feedback everyone :) lucy24 are you talking about multiple spaces? Yes that does sound suspicious..

The reason I'm avoiding whitelisting is mobile user agents - there seems to be a lot of them. We may end up doing it though.. there's a lot of PHP libraries available that identify mobile UAs and we could use those.

I definitely want to avoid explicit blacklisting, because they multiply every day.
6:14 pm on July 20, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14663
votes: 99


The reason I'm avoiding whitelisting is mobile user agents


That's no excuse not to whitelist as there are way more bad bots to blacklist.

There are some pretty simple ways of detecting mobile agents using a combination of things in the user agent and header that IDs most of them and some simple PHP scripts that already do it.

Worse case, just use browsecap.ini to look them up.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members