Forum Moderators: open

Message Too Old, No Replies

facebookexternalhit

         

lucy24

7:25 pm on Nov 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IP: 173.252.87 (IPv4), 2a03:2880: (IPv6, always with the element :face: somewhere in it)
UA: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Protocol: http only

And your point is ...?

From two different sites' logs:
173.252.87.5 - - [24/Oct/2018:00:35:27 -0700] "GET /robots.txt HTTP/1.1" 206 1029 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" 

2a03:2880:11ff:6::face:b00c - - [13/Oct/2018:12:41:25 -0700] "GET /robots.txt HTTP/1.1" 206 1020 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
I've been seeing this sporadically since mid-October on all sites; interestingly the earliest was my test site. (In theory, a human could blunder by and then post a link. I don't think it has ever happened, though I have seen the twitterbot a time or two.) Comprehensive search of raw logs confirms that I have never before seen FB asking for robots.txt. On one site they enthusiastically asked for it in batches of five. What they're planning to do with it remains unclear.

<tangent>
While checking this site to see if the topic had been covered before, I found this 5-year-old thread [webmasterworld.com] started by me but featuring lots of entertaining input from incrediBill. Nice stroll down Memory Lane, there.
</tangent>

:: uneasily wondering why site thinks my current time is two hours removed from what it really is ::

TorontoBoy

8:45 pm on Nov 4, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



69.171.251.13 [12/Oct/2018:16:50:18 GET /robots.txt HTTP/1.1 206 1551 - facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

69.171.251.13 [15/Oct/2018:19:11:46 GET /example/wp-content/uploads/2013/07/example-vending-machine.jpg HTTP/1.1 200 34892 - facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

That IP was used only twice in 2018 Oct, once for robots.txt. The other was a direct image call, so no header info.

Here is a neighbouring IP, but for a more normal call. FB seems to use the 69.171.251.11-25 range, flitting about like a butterfly, but rarely landing on my flowers.
2018-10-12:20:57:11
URL: /example/
IP: 69.171.251.17
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US
Authorization:
Connection: keep-alive
Host: example.com
Upgrade-Insecure-Requests: 1
User-Agent: facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

lucy24

10:34 pm on Nov 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That IP was used only twice
Huh, interesting. I know that FB has used a lot of different IPs--I don't generally keep track of which ones are currently active--but the two I listed were the only ones I'd ever seen asking for robots.txt. It's not unheard-of for an operator to use one IP for robots.txt requests and a different one for pages. But apparently that isn't the case here.

I don't think I have a lot of the kind of content that people enthusiastically recommend to all their online friends :( Or, conversely, the people who like my content don't spend a lot of time on social media.

lucy24

10:16 pm on Nov 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Continuing the theme of FB behavioral oddities:

#1 The other day they made a flurry of HEAD requests (only) to my front page from the ordinary FB IP and UA. What's up with that?

#2 A to-all-appearances-human request with user-agent
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
came in from an established FB range. This site happens to be IPv6, so that's
2a03:2880:nnnn:nnnn::face:nnnn
where all of
2a03:2880
is FB, and then they cherry-pick addresses so the string :face: will be included somewhere in the rest. (Idle speculation: do they use the other IPs for internal stuff that won't show up in other people's logs? What luxury otherwise, to be able to disregard 65535/65536* of your IP space!) I looked up the exact IP (obfuscated here) to make sure I hadn't got the range wrong.

* 0.99998 and some more digits.

aristotle

1:48 am on Dec 9, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've noticed that facebookexternalhit will oftentimes show up an instant before a real visitor from facebook arrives. It's like they're checking to make sure that the link is good before they let the person leave facebook.

lucy24

2:08 am on Dec 9, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heh, I always pictured it the other way around: someone posts an FB link and then they themselves verify that they linked to what they meant to link to.

aristotle

12:19 pm on Dec 9, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy Your explanation certainly must explain some of it, but sometimes one of my sites will get a flurry of visitors from facebook throughout the day and facebookexternalhit will show up just before MOST of those visitors. I don't know why it doesn't happen every single time, but it does happen most of the time.

SumGuy

2:54 pm on Dec 10, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I started blocking FB IP addresses about a month ago. For one thing, there's no way to know which users are posting links to my site, or be able to read anything they're writing about us in association with the posted links. I don't know what the FB link-checking bots do when a user posts a link that the bot can't check (presumably the bot is checking to see if the URL is valid or working). In terms of FB's bots crawling my site for other reasons unrelated to user activity, again I don't understand, based on what I know about FB's business model, how they are using what-ever they get. So those reasons alone wasn't what tipped the scale to get me to block their bots. It was the banning / cancelling of various people and groups from their platform that was the last straw. My way to tell FB to go to hell.