Forum Moderators: open

Message Too Old, No Replies

Long User Agent

Useful or Privacy Concern?

         

JAB Creations

3:45 am on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's been a while since an SQL error occurred and this one happened because the user agent string was too long:

Mozilla/5.0 (iPhone; CPU iPhone OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E302 [FBAN/FBIOS;FBAV/173.0.0.65.96;FBBV/109978100;FBDV/iPhone9,3;FBMD/iPhone;FBSN/iOS;FBSV/11.3.1;FBSS/2;FBCR/VIVO;FBID/phone;FBLC/en_US;FBOP/5;FBRV/0]


What are all of these...strings? Could they be betraying a user's privacy somehow? Additionally what should the minimal length of user agents be in an SQL database? Do I really need to use substring to keep this sort of nonsense under control?

John

keyplyr

3:51 am on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could they be betraying a user's privacy somehow?
Looks like a fake UA string compiled by someone who didn't know what they were doing... as such, it could be anything, and likely up to no good.

You have to look at the bigger picture. So look at the time stamp from your SQL error log, then go into your server's access log and find the event to see the other files requested from this visitor.

What range did they come from and what behavior? Did they behave and request normal files like a human would, or were the requests very rapid, faster than a human and ask for files not normal?

Hint: Malicious bots almost never requests CSS files.

JAB Creations

5:31 am on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What I found in the raw Apache log after a casual search (with the redundant user agents removed). There were some variations to the user agents though not much.

88.130.48.200 - - [26/May/2018:07:44:14 -0400] "GET /index.htm HTTP/1.1" 500 0 "http://m.facebook.com"
88.130.48.200 - - [26/May/2018:07:44:14 -0400] "GET /index.htm HTTP/1.1" 500 0 "http://m.facebook.com"
88.130.48.200 - - [26/May/2018:07:44:15 -0400] "GET /index.htm HTTP/1.1" 500 0 "http://m.facebook.com"
98.150.253.168 - - [26/May/2018:11:34:12 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"
23.242.136.26 - - [26/May/2018:13:54:42 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"
189.61.75.237 - - [26/May/2018:20:46:18 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"
72.234.62.25 - - [27/May/2018:01:03:58 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"
212.170.79.214 - - [27/May/2018:01:04:12 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"
67.60.80.32 - - [27/May/2018:01:06:02 -0400] "GET / HTTP/1.1" 200 0 "http://m.facebook.com"


None of those request got through. Any way, any suggestions on what the longest legitimate user agent might be or look like?

John

keyplyr

5:41 am on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not really, sorry.

Of the legit UA strings I've seen over the years, none were that long... usually about half that.

However, a UA can be as long as the developer wants, but what would be the point?

One restriction might be packet size. Server requests contain various info but if too much, an addition request has to be made. That would not be efficient.

There are several look-up websites for UAs; maybe look through them to get some ideas.

JAB Creations

5:53 am on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, Internet Explorer 6 days, when every version of .NET was reported in the user agent for MSIE! Apparently 360 characters is legitimate. I suppose I can just bump it a bit higher than that and then manually crop it in the server code before logging it.

An interesting list of likely legitimate user agents:
https://gist.github.com/jonelf/3743071 [gist.github.com]

John

TorontoBoy

3:12 pm on May 27, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



88.130.48.200 88.130.0.0 - 88.130.127.255
descr: Versatel Deutschland

98.150.253.168 98.144.0.0 - 98.157.255.255
CIDR: 98.144.0.0/13, 98.152.0.0/14, 98.156.0.0/15
Organization: Time Warner Cable

23.242.136.26 23.240.0.0 - 23.243.255.255
CIDR: 23.240.0.0/14
Organization: Time Warner Cable

189.61.75.237 189.60.0.0/14
owner: CLARO S.A.

72.234.62.25 72.234.0.0 - 72.235.255.255
CIDR: 72.234.0.0/15
Organization: Hawaiian Telcom

212.170.79.214 212.170.72.0 - 212.170.79.255
role: Administradores Telefonica de Espana

67.60.80.32 67.60.0.0 - 67.61.255.255
CIDR: 67.60.0.0/15
OrgName: CABLE ONE

These all look like home internet service providers. Perhaps zombie computers? The Claro one makes me afraid, as last year I had a Semalt scraper attack from them and others from Latin America, called "1-99seo .com". That thing kept coming back like a bad rash. Eventually they gave up.

Samizdata

3:43 pm on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What are all of these...strings?

They appear to be tokens added by the crummy Facebook in-app mobile browser.

Could they be betraying a user's privacy somehow?

Some are there to enable Facebook to track what their members do on third-party websites (like yours).

So that's a "yes".

...

TorontoBoy

4:32 pm on May 27, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Mozilla/5.0 (iPhone; CPU iPhone OS 11_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E302 [FBAN/FBIOS;FBAV/173.0.0.65.96;FBBV/109978100;FBDV/iPhone9,3;FBMD/iPhone;FBSN/iOS;FBSV/11.3.1;FBSS/2;FBCR/VIVO;FBID/phone;FBLC/en_US;FBOP/5;FBRV/0]

There are 14 "FB"s in this UA? How could I miss that?

lucy24

5:22 pm on May 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hint: Malicious bots almost never requests CSS files.
Although Bots of Unknown Purpose occasionally do. I've currently got two that requests pages + stylesheets + scripts, but no images or favicon.

Searchfor:
\.css .+?FB.+?FB.+?FB
Yup, there they are: and once I've got them, I can see the adjoining requests for images and piwik.* If I then look back, I see the page request with m.facebook.com referer, and perhaps even an immediately preceding request from facebookexternalhit. (In these situations it helps to be small, because something that happened several minutes earlier will still be easily findable in logs.)

It's a quasi-browser built into the FB app, isn't it? Analogous to the quasi-browser built into the various search-engine apps.


* In other news, I recently got a mass mailing from {major search engine} complaining that the URL /piwik/ is indexed although roboted-out, and asking me to make it crawlable with noindex instead. Oh, hell no, {major search engine}, not going to happen.

Samizdata

5:46 am on May 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do I really need to use substring to keep this sort of nonsense under control?

On my personal site I intercept the Facebook in-app browser and feed it an interstitial.

Third parties tracking people while they are on my site is definitely a privacy concern.

Most visitors use the workarounds offered.

...