Forum Moderators: open

Message Too Old, No Replies

TwitterCrawler

         

keyplyr

11:07 pm on Dec 30, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; TwitterCrawler)
Protocol: HTTP/1.1
Robots.txt: No
Host: hetzner.de
136.243.0.0 - 136.243.255.255
136.243.0.0/16
188.40.0.0 - 188.40.255.255
188.40.0.0/16

It may be crawling Twitter, but this agent is not affiliated with Twitter AFAIK.

lucy24

2:25 am on Dec 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The "robots.txt: No" is pretty well a dead giveaway by itself, because the twitterbot always asks for robots.txt. I don't know if it makes exceptions when it's requesting the same page many times within a short period (like, uh, six different people independently and coincidentally tweeted the same page within five minutes of each other? oh happy day).

<topic drift>
:: detour to raw logs ::

The record is 1 minute, 53 seconds* for dual page requests on a single robots.txt. I also find one orphaned robots.txt request. (Did someone tweet one of my error documents, or maybe the Legal page? Haha.) But wtf is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0"? Some kind of human addon?

:: wandering off in search of a thread that will explain the (real) Twitterbot's rare requests for an image ::


* Assuming my brain is correctly subtracting 7 seconds from 2 minutes.
</td>

keyplyr

4:53 am on Dec 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But wtf is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0"? Some kind of human addon?
First guess, since the string includes "Facebot" as well as "Twitterbot", would be an imposter. Did it come from an actual crawl range of Twitter or FB?

lucy24

7:59 am on Dec 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Did it come from an actual crawl range of Twitter or FB?

No, it was the beginning of a--to all appearances--human page visit.

:: dragging up logs again ::

With the "Facebot Twitterbot" UA it requested the page and the apple-touch-icon. And then, continuing from the same IP (Sky, London area, visiting a page that's currently popular in the UK) but with an ordinary iPhone UA it requested the page--again--plus all supporting files, including piwik. That's what made me wonder if it's some kind of plugin/addon/app-thingy.

keyplyr

8:48 am on Dec 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, probably a Safari user with a User Agent add-on under the misguided belief that including those attributes may get them through to hidden goodies on the server: amateur foolishness.

Only other thought that comes to mind is a save-page-to-home-screen app that then no longer shows in the remaining thread, but why the Twitter & FB attributes?

Nope, back to my origional assumption.

lucy24

8:05 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: bump ::

Saw it again, and remembered earlier discussion.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0
(identical UA string as above).

This time it came about 15 minutes after a human visit, from the US this time. The human UA was a Mac desktop (slightly different OS), using Chrome/56.

:: detour to logged headers ::

Hm, now that's interesting. Much later on the same day, the identical IP requested the identical page, this time with an iPhone UA. But the only thing that's even remotely anomalous on the part of the KitchenSinkBot is
Accept-Language: en-us
(only) which is generally more characteristic of mobiles.

:: final detour to raw logs ::

One other occurrence, this time with an image request, the human component being an iPhone. (No referer. Wonder how they ended up on that particular picture?)

keyplyr

10:42 pm on Apr 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wonder how they ended up on that particular picture?
You wonder because you are thinking like a human. This is just code basically saying... open directory, get file.

This time it came about 15 minutes after a human visit, from the US this time. The human UA was a Mac desktop (slightly different OS), using Chrome/56.
I still think it's a script posing as a browser(s)