Forum Moderators: open

Message Too Old, No Replies

Looking for more info about facebookexternalhit/1.1;line-poker/1.0

Not sure if hits are legit

         

SumGuy

3:44 pm on Aug 17, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yesterday I saw hits from about a dozen IP's in the range 147.92.179.105 to 147.92.179.119.

They are making direct HTTPS requests for the same PDF file on my site (one of many scientific research papers). Each request is for exactly 32kb - the file is several mb in size so I don't know if this is a tag-team effort or if they are all, individually, just requesting the first 32kb of the file. 38 requests spread across those various IP's. The user-agent is:

facebookexternalhit/1.1;line-poker/1.0

Twice the language has been en, all other times it was zh-TW.

The IP's seem to all reverse to nio-pagepoker.line-apps.com. There appears to be no working website for that domain.

The IP is owned by "LINE Corporation" in Japan (AS38631)

I wasn't thinking of blocking it because I was also seeing activity from Taiwan that looked like human activity requesting the same PDF, one of the IP's belonging to a .EDU. I was thinking that a bunch of people got interested in the file over fecebook and line-app.com stuff was somehow helping to facilitate this communication but now I'm not so sure.

Any idea what thing is doing?

lucy24

4:52 pm on Aug 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to raw logs to look up “line-poker” element ::

Oh, my. Back in February there was a total of 45 blocked requests for a single page, with a trickle of 4 more in March and May ... and just one in May for a different page. All from the range of, in my case, 147.92.137.128/26 (i.e. last digit all in the 128-191 range).

Cross-checking for the same page in the same time frame--I was looking for human visits and/or social media that might afford a clue--reveals an equal flurry of blocked requests from 36.110.147.abc (Chinanet) with an iPhone UA, starting a day or two earlier. This may, however, be a red herring, since this IP-and-UA combination turns out to be responsible for thousands (literally) of blocked requests this year alone, typically in batches of 3 or more. Perhaps they’re following up the same leads as the Line folks.

tl;dr version: No idea, but worthy of further attention.

Edit: Logged headers reveal some further points of interest:
X-Forwarded-For: 10.115.91.abc
Range: bytes=0-0
Looks as if they were blocked on charset grounds. (This is rare on my site, but a handful of Accept-Charset patterns seem to be limited to robots.) The exact X-Forwarded-For varies, but always in that neighborhood. 10 is Private Registrations, which could be absolutely anything, but is not calculated to inspire confidence.

tangor

10:11 pm on Aug 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still working through what "facebookexternalhit" is all about in the first place. Actual hits from fb have an appended query string/id whatever and these do not... and each sucks the full page/image/css each time they visit.

Half tempted to block... just not sure. Thoughts?

tangor

10:12 pm on Aug 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also add "Cortex" to that as it also appears to be an fb drone/bot

tangor

10:12 pm on Aug 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also add "Cortex" to that as it also appears to be an fb drone/bot

lucy24

11:07 pm on Aug 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I understand it, facebookexternalhit shows up when someone on FB puts up a link to your page. They first get all the images belonging to the page, and then--at least in theory--the FB user selects one of them. So if a FB post gets a lot of views, you'll see a lot of requests for some specific image.

cortex first showed up in
:: shuffling papers ::
February 2019. It seems to involve images only.

That’s setting aside the mysterious recent full crawls from FB, still using the externalhit UA but distinguishing itself by asking for--and apparently honoring--robots.txt.

tangor

12:35 am on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So if a FB post gets a lot of views, you'll see a lot of requests for some specific image.


And that's where I get puzzled as that seems to be a sneak method of hot-linking SINCE THERE IS NO CORRESPONDING hit on the page where that image resides.

If fb cached the image on their servers, okay ... but the same images are hit time and time again.

REALITY: Ain't that dang many hits (after all, it is fb!) and that volume is not a threat to my bandwidth, Just irksome.

tangor

12:38 am on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On the other hand, the PDF files they routinely hit are NOT SMALL (2.5mb-5.7mb in size), and these are copyrighted, free to the public, but why does fb need that many copies of the same material?

dstiles

8:37 am on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> facebookexternalhit shows up when someone on FB puts up a link to your page

I would dispute that. I've seen it hit the most unlikely sites/pages, ones that have no interest for anyone; they are almost obsolete.

wilderness

1:22 pm on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



May be more IP's but here's what I've.
First three most common.

173.252.64.0 - 173.252.127.255
66.220.144.0 - 66.220.159.255
31.13.64.0 - 31.13.127.255
204.15.20.0 - 204.15.23.255 (2009; rarely seen in recent years)

Facebook, Inc. (THEFA-3) (locals from 2014)
LLA2-05 Facebook 31.13.114.0 - 31.13.114.255
THEFA-3 129.134.0.0 - 129.134.255.255
THEFA-3 157.240.0.0 - 157.240.255.255
FACEBOOK-INC 173.252.64.0 - 173.252.127.255
TFBNET1 204.15.20.0 - 204.15.23.255
ZAYO-IPYX-077110-208-185-168-128-29 208.185.168.128 - 208.185.168.135
TFBNET3 66.220.144.0 - 66.220.159.255
TFBNET2 69.63.176.0 - 69.63.191.255
TFBNET3 69.171.224.0 - 69.171.255.255
TFBNET4 74.119.76.0 - 74.119.79.255
FACEBOOK-IPV6-BLOCK-1 2620:0:1C00:: - 2620:0:1CFF:FFFF:FFFF:FFFF:FFFF:FFFF

99.99 % of FB users are one-dimensional (FB ONLY) and will not venture beyond that site.
In most instances, a FB user does not have to even click (or mouse-click) on the link for the refer to be shown in visitor logs.
FB keeps an image in their cache (for lack of a better word) for a long time (I've a couple image only requests (jpg or pdf) been ongoing more than a year.

tagnor,
I let FB in for a short while, currently (and for a long while) have all denied.
Unless you receive some sort of revenue, the traffic not really beneficial (one-dimensional).

Should you wish to utilize FB traffic there's an icon which you may add to each web page to show you are FB Friendly (same for twitter and some others).

lucy,
the FB user does not select the primary image, rather FB itself. I've had pages with multiple images and FB may select one incorrectly that is not relevant to the FB text (statement).

wilderness

1:26 pm on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BTW, don't have any such UA in current months logs.

facebookexternalhit/1.1;line-poker/1.0

lucy24

4:58 pm on Aug 18, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



2620:0:1C00:: - 2620:0:1CFF:FFFF:FFFF:FFFF:FFFF:FFFF
A delightful feature of FB's IPv6 range (visible on sites that have an IPv6 address) is that one of those later sectors will always contain the string :face: Incidentally, I see them only from 2a03:2880:blahblah

the FB user does not select the primary image, rather FB itself
Feh. How utterly pointless of them.

tangor

1:31 am on Aug 19, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, folks ... I suspected many of the above were the reality. Might as well invest a month (or more) to test all that out.

As noted, ACTUAL USER VISITS from fb include an absurd tracking =?etc string. While PITA in the logs, they are trackable.

The external hit and cortex stuff hits my LARGEST FILES a few hundred times each month and I get nothing back...

lucy24

6:27 pm on Aug 20, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Returning to the original topic... Found them in logs today, with slightly different headers & behavior:
IP: 147.92.179.abc
...
User-Agent: facebookexternalhit/1.1;line-poker/1.0
Accept-Language: zh-TW
X-Forwarded-For: 211.72.2.abc
X-Real-Ip: 211.72.2.abc
The “real” IP is, in fact, Taiwan. But unlike earlier blocked requests, these (a) didn’t have the anomalous header that resulted in blocking, and (b) included several requests for a single image associated with that page.

Further points of interest:
--The requested page was ... drumroll ... the self-same page from last May which was different from all other requests.
--The requests were immediately followed by a human request for the same page from the same “real” IP.

To reiterate:
Hmmmmm.

tangor

1:03 am on Aug 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update ... since I blocked the facebookexternalhit and cortex on the 18th the number of REAL HITS from fb has blossomed! Makes me wonder if they were NOT sending folks my way because my content was cached on their systems?

Just a remark on a change of behavior.

lucy24

6:40 pm on Aug 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the number of REAL HITS from fb has blossomed
I am not absolutely certain this is to our advantage. It may simply mean that instead of displaying cached content, FB triggers a request from the human user’s IP and UA: still displayed in the same way within Facebook, just looking different in site logs. Do these new human visits result in people sticking around and investigating additional pages?

:: looking vaguely around for anyone who, unlike me, actually uses Facebook and is in a position to shed light ::

tangor

9:26 pm on Aug 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do these new human visits result in people sticking around and investigating additional pages?


Yes

tangor

9:28 pm on Aug 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That is to say, at least two additional pages each were hit from 12 =fbclid hits.