-- Search Engine Spider and User Agent Identification
---- how many bytes make a human?
lucy24 - 4:48 am on Feb 7, 2012 (gmt 0)
MSIE 6 and below can be rejected unless you think some of your punters really are dumb enough to be using MSIE 6 a couple of years after MS discontinued support for it.
Heh. There exist people on this planet who still use MSIE for Mac-- and that means 5. I originally thought it was because they had very very old computers that just couldn't use anything else. Never mind that even ten years ago, there were better choices.
But it turns out you can't tell from the UA string. MSIE 5 doesn't know from Intel-- it just knows 68K vs. PPC-- so it says PPC. (I tested on myself.) Like those elderly www sites whose html divides the universe into MSIE and Netscape.
I think the official length of a UA is 127 bytes but a LOT of browsers, especially MSIE, exceed that.
There doesn't seem to be any limit to the number of .NET CLR statements they will throw in. And the Google-Mobile UA I quoted in the first post is almost 500 characters. Doubly improbable because mobiles tend to have shorter UAs. Less room, haha.
Wonky spacing might be another good one. Either missing or too many. I recently blocked the plainclothes MSNbot-- the one that claims to be MSIE 7. Different thread. One of its distinguishing traits is that every .NET CLR is preceded by a double space.
One of my favorite recent UAs is
Mozilla/4.0 (compatible; MSIE 4.01; Digital AlphaServer 1000A 4/233; Windows NT; Powered By 64-Bit Alpha Processor)
I have no idea what that would be in real life. But it sounds a lot like my pocket calculator, which is 30 years old and still going strong.
And then there was the query that emerged from the logs as
I tried to disentangle those plusses, but gave up. Not sure about the full stops, either.
I get weird queries, so I can't mess with them. The ones that disencode to multi-byte characters in particular are definitely legit.