Forum Moderators: open

Message Too Old, No Replies

Odd User Agent

         

wilderness

5:09 pm on Jan 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



leading ['
trailing ']

"['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.1.17 (KHTML, like Gecko) Version/7.1 Safari/537.85.10']"

trintragula

10:05 pm on Jan 21, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I've seen a similar pattern from ZumBot:

[('User-agent', 'Mozilla/5.0 (compatible; ZumBot/1.0; http: //help. zum. com/inquiry)')]
though sometimes it's 'correct'.

They're from Korea Telecom.

wilderness

11:12 pm on Jan 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This came from GoDaddy on a site/Blog that I administer:

216.69.191.97 - - [13/Jan/2015:15:52:32 -0500] "GET /robots.txt HTTP/1.1" 200 2332 "-" "['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.1.17 (KHTML, like Gecko) Version/7.1 Safari/537.85.10']"
216.69.191.97 - - [13/Jan/2015:15:53:18 -0500] "GET /robots.txt HTTP/1.0" 200 2332 "-" "Python-urllib/1.17"
216.69.191.97 - - [13/Jan/2015:15:53:35 -0500] "GET / HTTP/1.1" 200 16199 "-" "['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/600.1.17 (KHTML, like Gecko) Version/7.1 Safari/537.85.10']"

trintragula

11:42 pm on Jan 21, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Ah, Python.
Apparently [] indicates a list. Single quotes surround strings. The parenthesis in the example I saw are for tuples (kinda like records/structs in some other languages, I would guess). But I don't really know Python, so I may be wrong.
So we're probably looking at Python renderings of one of the bots data structures rather than the string that should have been selected from it to choose the UA to pretend to be.
Either a coding error or a cut-and-paste error, I would imagine.

lucy24

11:44 pm on Jan 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops! Overlapped

I've seen a similar pattern from ZumBot:

[('User-agent', 'Mozilla/5.0 (compatible; ZumBot/1.0; http: //help. zum. com/inquiry)')]
though sometimes it's 'correct'.

I've seen the occasional robot whose UA string began with the literal text "User-Agent: " demonstrating once again that botrunning does not require any particular intelligence. Most of them must have come from blocked IPs, because I didn't realize how many there were until I did a log search. Maybe there's some generic robot script that contains this error, duly copied and pasted.

Edit:
Apparently [] indicates a list. Single quotes surround strings.

Plenty of languages, including php and javascript, will let you designate arrays that way.

trintragula

12:00 am on Jan 22, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Seems likely.
I've seen half a dozen variants that start that way.
I particularly liked this one:

User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)


EDIT:
Plenty of languages, including php and javascript, will let you designate arrays that way.

I know, but one of wilderness' examples featured python-urllib/1.17...
The variant we're seeing with User-agent: is probably something else.

keyplyr

4:46 am on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RE: "User-Agent" or "User Agent"

There is a thread that discusses this. Black Hat (hacker) forums offer compromised IPs for sale w/ optional add-ons including bots containing a shell utility where the buyer just fills in the text fields.

One solution:
RewriteCond %{HTTP_USER_AGENT} agent [NC]

lucy24

5:41 am on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mine's more narrowly constrained:

BrowserMatch "User-Agent:" keep_out


(The quotation marks may not be necessary, but the colon made me uneasy.) Who knows? Some day there may be a legitimate "ReagentBot" or "SeaGentleman" or ... hm, where's that Scrabble dictionary anyway?

where the buyer just fills in the text fields

And sometimes does so incorrectly, ending up with the robotic equivalent of "I comma your name comma".

I particularly liked this one:

I've got an out-and-out block on anything that claims to be a Googlebot but doesn't come from an accredited Google IP. Also vice versa, with some exceptions.

keyplyr

9:06 am on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy, just a FYI - the example I gave in my post is because this UA is often without the dash.

trintragula

9:20 am on Jan 22, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



where's that Scrabble dictionary anyway?


"FrontPageNT" ? :) ... maybe not.

I've also seen:

User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1


which hit my robots.txt trap a long time ago.

I suppose we could match on 'user[^a-zA-Z]?agent[^a-zA-Z]'. I'm a bit old-fashioned with regex - maybe there's a neater way to do this now. When I started with regex I don't think the IBM PC had been invented... or \w for that matter.

Then there's this one:

* Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

keyplyr

10:54 am on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From my notes, there's also several bad actors including "agent" in their UA so just blocking "agent" gets a few.

lucy24

8:24 pm on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess a wider-ranging compromise would be
[Uu]ser\W+[Aa]gent

or
\b[Aa]gent\b

with an outside option on [NC] (or BrowserMatchNoCase) if they're being really obnoxious. At least until the day an unspaced "UserAgent" shows up.

Any RegEx engine that recognizes \w should then also recognize \W and \b (and, if you must, \B). Variation [regular-expressions.info] generally applies to what is and isn't counted in non-ASCII ranges, which isn't likely to be an issue in www contexts.

keyplyr

8:32 pm on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... until the day an unspaced "UserAgent" shows up.

Yup, it shows up every so often YMMV. Again, this is why I just block "agent"