Note space before middle semi-colon:
I'd always assumed in a vague sort of way that spurious space = useless robot. Shove in a BrowserMatch looking for space followed by [;:,)] et cetera and you can forget about it. But after applying some brute force and a Regular Expression* I've had to conclude that 'tain't necessarily so.**
SV1) ;
Configuration/CLDC-1.1 )
U; ;
all appear to be legitimate. (The third one shows up in some rare ex-Soviet-bloc UAs, but seems to be human.)
On the other hand are mostly the no-brainers:
"GeoHasher/Nutch-1.0 (GeoHasher Web Search Engine; geohasher.gotdns.org; geo_hasher at yahoo * com)"
(This only turned up because there was no reason to exclude asterisk from the search)
"Mozilla/5.0 (compatible; spbot/3.0; +http://www.seoprofiler.com/bot )"
(Really, I don't think we need the extra space to give us any information here!)
"^Mozilla/4.0 \\(compatible; MSIE 8.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727\\)$"
(I'm not kidding. That's from raw logs, not from an .htaccess file. Maybe they pasted it in from someone else's htaccess. Or even their own.)
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.1.4322; &id;
)"
(As above: didn't exclude &. What
is &id; anyway? It's not an HTML entity.
No! BAD smiley! Get out of there!)
"Lotus-Notes/4.5 ( Windows-NT )"
(Really? You think it might be a robot?)
Phooey.
Haha. Another good idea down the drain.
* [\p{Punct}&&[^-/.{(\[quote]] (with leading space) applied to raw log files.
** Like those scientific surveys where they investigate something everyone already knows. Huge waste of money if it turns out everyone was right all along-- but infuriating when it turns out that "common knowledge" is wrong.