Hi,
I am looking at good, simple heuristics to check if a user agent is "believable". For example, I want a user agent like "RAV1.23" to be rejected, but a normal one like "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" to be accepted. I'm thinking about these rules
1. If the user agent doesn't contain "/", reject it.
2. If the user agent doesn't start with a letter, reject it (this catches some bizarre ones)
3. If the user agent doesn't contain at least one space, reject it (This seems like a bad idea, eg "NokiaE66/UCWEB8.5.0.163/28/800" looks legit but has no space)
4. If user agent is 10 characters or less, reject it (This allows "Mozilla/4.0" but nothing shorter)
Will these rules reject any legitimate user agents? What I'm basically looking for is rules that will detect unknown, suspicious user agents while not rejected any legitimate ones.
Thanks.