|Steve Souders offers help in user agent decoding|
for UA Profiler
Since Steve is a google employee, formerly of Yahoo, I guess it's ok to link to this post?
|When I started UA Profiler, I assumed I would be able to find a library to accurately parse the HTTP User-Agent string into its components. I need this in order to categorize the test results. Was the test done with Safari or iPhone? Internet Explorer or Maxthon? NetNewsWire or OmniWeb? My search produced some candidates, but none of them had the level of accuracy I wanted, unable to properly classify edge case browsers, mobile devices, and new browsers (like Chrome and Android). |
So, I rolled my own.
I find that it’s very accurate - more accurate than anything else I could find. Another good site out there is SNIP, but even they misclassify some well known browsers such as iPhone, Shiretoko, and Lunascape. When I do my daily checks I find that every 200-400 new user agents requires me to tweak my code. And I’ve written some good admin tools to do this check - it only takes 5 minutes to complete. And the code tweaks, when necessary, take less than 15 minutes.
It’s great that this helps UA Profiler, but I’d really like to share this with the web community. The first step was adding a new Parse User-Agent page to UA profiler. You can paste any User-Agent string and see how my code classifies it. I also show the results from SNIP for comparison. The next steps, if there’s interest and I can find the time, would be to make this available as a web service and make the code available, too. What do people think?
* Do other people share this need for better User Agent parsing?
* Do you know of something good that’s out there that I missed?
* Do you see gaps or mistakes in UA Profiler’s parsing?
I found Steve Souders project very interesting and worth researching.
We don't normally allow links to blog posts (Steve's) or self-promotion on WebmasterWorld but I feel this is worth an exception.
For those that may not know this, our member GaryK maintains an excellent list of user agent data called the Browser Capabilities Project [browsers.garykeith.com] which contains a wealth of useful information.
I would like to know what level of accuracy he is looking for? What details does he need exactly for his project? And what software he considered to parse the user-agents? Are still in active development? How long will he support this project of his? He says the software is open source, but has not posted it, as of writing this post. How can it be open source and not be posted anyplace? I have plenty more questions but I don't want to Bill to show me the end of his boot.
I know form personal experience keeping track of the different browsers, and their abilities or lack there of is hard considering the lack of accurate documentation on them. Even harder is trying to come up with some standards how to name them, consistently over time and versions. Also working with others who do the same thing, takes time and a decent level of communication.
All I am going to say is that lets check in a year or so, and see if he is still doing this project and how much it is grown or used outside his own personal use.
I thought Brett's comment was interesting at the bottom of that post. Wow, mostly wrong reporting :(
Parsing the User-Agent alone does not tell you WHAT the agent is. Many robots pretend to be one of the basic MS UAs, quite a few use FF or Opera UAs. You need secondary header information to make a decision, and even then it can be misleading. I wonder if the guy intends taken that into account?
Problem with User-Agents alone to identify browsers is that it is abused a great deal, as was previously pointed out. Using User-Agents alone still leaves a lot to be desired if your truly looking for the true browser name/version information, not just the reported browser information from the User-Agent.
Quite right, Owen. The user agent alone is only good for a starting point. It tells you this is the browser being reported. That's why I like bots that let me do a full round-trip DNS lookup on them to see if they're really from where they claim to be. But for most use, as in not bots, it's a great tool to help webmasters decide what sort of code to display. Then if the UA seems abusive you can go the extra mile and do the DNS lookup to see if it's for real.
Hopefully he'll fix up the issues. Steve has done some great work with Yslow and his book was very helpful too if any of you have read it.