Forum Moderators: open

Message Too Old, No Replies

what's facebook doing on that platform?

         

lucy24

11:51 pm on Apr 3, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can we give this UA a thread of its own? It shows up as a by-the-way in other facebook-related threads like

[webmasterworld.com...]

What does facebookplatform do? I met two of 'em today. Didn't recognize the name, so I detoured to raw logs and found none, nil, zero within a chunk that contains at least 90 facebookexternalhotlink hits.

That is: Spotlight found 90 files. Probably several separate visits in each, given their devotion to beating their heads on the 403 door. Their recent record-- from just a few weeks ago-- is 45 attempts over the course of 4 hours.

To be exact:

66.220.156.0 - - [03/Apr/2012:08:44:46 -0700] "GET /hovercraft/images/hover_before.jpg HTTP/1.1" 200 30731 "-" "facebookplatform/1.0 (+http://developers.facebook.com)"

and

69.171.234.7 - - [03/Apr/2012:11:16:24 -0700] "GET /silence/silence.html HTTP/1.1" 206 2126 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.234.0 - - [03/Apr/2012:11:16:25 -0700] "GET /silence/images/smallhome.jpg HTTP/1.1" 200 5747 "-" "facebookplatform/1.0 (+http://developers.facebook.com)"

The image belongs to the html page. The 206 response is normal for externalhotlink requests for .html files; they're only blocked from images.

Punch line: I recently caved in and decided that externalhotlink can snuffle around in the "grownup" directories, so long as it stays away from the hot-link fodder in /ebooks and /fun. Meanwhile, in approved French-farce fashion, facebook was busy running up a new costume. That 66.220. above was a pickup of the same file as their externalhotlink record-holder, dating from 2-3 weeks ago. They could have asked for it under the old name.

Oh. Right. As an afterthought I went and looked up the link in the UA. Could have saved myself the trouble. Unless it means that FB thinks it's google, and we can now expect to find any and all kinds of unwanted visits from vaguely familiar names in vaguely familiar IP ranges.

keyplyr

10:07 am on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've described this more than once, but here it is again.

What Facebook was doing also puzzled me. I had suspicions they were hotlinking my image files or scraping my articles/content. Actually, both are true - but in a good way. Unless you have an account with Facebook, you can't really tell, so I got an account and discovered...

When a Facebook user likes your web site, or a particular page on you web site, they sometimes share it with their friends (those who follow them) by posting your URL. Then the Facebook utility vistits your site and grabs an image from that page to use as an icon, as well as a short snippet (one or two sentences) from the content as a preview. If the page has a META DESCRIPTION tag, Facebook will sometimes use that along with the TITLE of the page. That's when you'll see this hit:

69.171.234.0 - - [03/Apr/2012:11:16:25 -0700] "GET /silence/images/smallhome.jpg HTTP/1.1" 200 5747 "-" "facebookplatform/1.0 (+http://developers.facebook.com)"


Then everyone that follows that Facebook user will see the link w/ image & snippet to you page. Some users will click-through and visit your web page. That's when you'll see this hit:

69.171.234.7 - - [03/Apr/2012:11:16:24 -0700] "GET /silence/silence.html HTTP/1.1" 206 2126 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"


And, if you're lucky, sometimes other Facebook users will re-post that link to your site and it may even become viral, resulting in thousands of visitors. Since I started posting my own links to my own web pages, I've been getting triple-digit visits every day. Nice traffic getter :)

And because Facebook is a "closed" system, it does not generate the swarm of bots like Twitter does. In 3 months of using Facebook, I have not seen one bad bot come from my links there.

lucy24

9:08 pm on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I've seen that description applied many times to facebookexternalhit-- but it never does lead to human visits, just the one hotlinked image over and over. Or occasionally a human visit with the dreaded "leaving facebook" page as referer. (I think these are people who said "Check out this great picture book!" but didn't avail themselves of FB's hotlinking opportunities.)

The way you describe it, it makes it sound as if you would get lots of "platform" hits to a single image, followed by the occasional "externalhit".

Here I'm seeing the opposite: a single "externalhit" visit to a page (just the html, always returning 206 even if the page has in fact changed since their last visit), immediately followed by a single "platform" visit to a selected image belonging to the page. It's the "platform" that's new on me; I'm accustomed to "externalhit" doing both parts.

Matter of fact I goofed in my original post, because both "platform" visits to an image were immediately preceded by an "externalhit" 206 to the associated page. (The first one was hiding behind a file that I would normally delete by default, so I missed it.)

As they used to say on Homicide: No Human Involvement.

keyplyr

11:01 pm on Apr 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen that description applied many times to facebookexternalhit-- but it never does lead to human visits, just the one hotlinked image over and over.

It won't lead anywhere. It's a forward, but it *is* a real person. And AFAIK Facebook will not "hotlinked [an] image." It copies the image to the Facebook server and displays that, not your copy on your server.


The way you describe it, it makes it sound as if you would get lots of "platform" hits to a single image, followed by the occasional "externalhit".

No, just the opposite. As described above, the Facebook utility will get the image once, copy it to its own server. Then you'd see lots of "externalhit" requests depending on how many Facebook users clicked-through the link.

lucy24

1:17 am on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think what we have here is a failure to communicate :(

the Facebook utility visits your site and grabs an image from that page to use as an icon, as well as a short snippet (one or two sentences) from the content as a preview. If the page has a META DESCRIPTION tag, Facebook will sometimes use that along with the TITLE of the page. That's when you'll see this hit:

69.171.234.0 - - [03/Apr/2012:11:16:25 -0700] "GET /silence/images/smallhome.jpg HTTP/1.1" 200 5747 "-" "facebookplatform/1.0 (+http://developers.facebook.com)"

Except that I won't. As noted in the OP, I have never seen the "platform" UA before yesterday. But I have seen the "externalhit" UA many, many times.

Then everyone that follows that Facebook user will see the link w/ image & snippet to you page. Some users will click-through and visit your web page. That's when you'll see this hit:

69.171.234.7 - - [03/Apr/2012:11:16:24 -0700] "GET /silence/silence.html HTTP/1.1" 206 2126 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

One second before facebookplatform picked it up, which is a pretty nifty trick. (But may not mean anything, because my logs have been known to hiccup this way before.) And all those externalhit humans will visit the page without picking up the stylesheets, images, backgrounds and favicon that one would expect a human to get. Remember, this is from raw logs. I'm recording the entire contents of the visit.

I went back and dug up that record-breaking 45-hit streak. Turns out I'd already noticed that day's visits for other reasons, having to do with page content + geographical location of IP.

#1 human in highly desirable (for me) location stumbles across page.
#2 Facebook pays usual visit to html only, picking up its standard 206.
#3 A few humans visit the page, with facebook (with or without /l) given as referer.
#4 In the course of the day, there is a grand total of 45 (thank you, text editor) failed attempts at one image from this page. Not the css. Not the other images. Not the piwik files. Just the one image, over and over.

So... if /small_before.jpg is already on FB's server, what does a human gain by clicking a link that will simply show them the identical image all over again?

Incidentally, I have no idea why FB was snuffling around /silence/. Normally their visits start immediately after a human ("Ooh! ooh! I gotta share this page with my 87 closest friends!"), but this time there wasn't one. Maybe they happened to check their database and found a leftover shopping list from last year.

keyplyr

5:24 am on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month






I don't really understand what point you're trying to make but Facebook does not crawl on their own.

lucy24

8:37 am on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No, but they must have a long memory. Like the googlebot in this respect ;) If they've asked for something once, they will continue asking for it periodically over the months to come. Not just the hotlinked image but the underlying page-- always with that mysterious 206 response. (What is a 206 anyway? I've only found one intelligible explanation, and it doesn't fit these circumstances.)

I went back to a random bunch of logs from last year, before I was locking anyone out, and asked Spotlight to find me some clumps of facebookexternalhit visits so I could see how they behave in the wild.

The pattern is:

Some human visits a page and presumably likes it. (Or, ahem, hates it so wildly that they want to show all their friends how much it stinks?) Within seconds, facebookexternalhit shows up and collects the entire page, just like the human did: html, all images etc.

Then, over the coming hours and days, there are repeated facebookexternalhit requests for one image from that group. There may also be human visits giving "facebook" in some form as referer. But the only function of externalhit itself is to keep dispensing that image.

And I am still no wiser about what the platform is for.

dstiles

7:58 pm on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From rfc-2068 (http://www.rfc-editor.org/rfc/rfc2068.txt)...

-----
206 Partial Content

The server has fulfilled the partial GET request for the resource.
The request must have included a Range header field (section 14.36)
indicating the desired range. The response MUST include either a
Content-Range header field (section 14.17) indicating the range
included with this response, or a multipart/byteranges Content-Type
including Content-Range fields for each part. If multipart/byteranges
is not used, the Content-Length header field in the response MUST
match the actual number of OCTETs transmitted in the message-body.

A cache that does not support the Range and Content-Range headers
MUST NOT cache 206 (Partial) responses.
-----

In other words, the visitor was only asking for a sub-section of your page.

keyplyr

8:43 pm on Apr 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There may also be human visits giving "facebook" in some form as referer. But the only function of externalhit itself is to keep dispensing that image.

Not really. Being a closed system, if a FB user likes your page, he/she can enter your URL into the "share" text field and the FB utility comes and grabs several images from the referred page. The user can then choose one of several images to best represent your web page, which may get hit a few times in this process. All this is done on the FB server, so the FB user doesn't have to actually visit you page at that time.

As I said earlier, I didn't understand this until I opened an account. I just automatically figured FB was hot-linking images and scraping content like so many other pests. I have since changed my opion in that respect. I use it as a traffic getter and don't waste much time on FB.

So open a free FB account and everything will become clear :)