Welcome to WebmasterWorld Guest from 54.221.119.45

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Facebook scraper ? I can't figure this out.

     

kahuna

2:14 pm on Apr 27, 2014 (gmt 0)

10+ Year Member



Facebook scraper ? I can't figure this out...

Did a FB crawler/bot come by, or did someone post links to my pages...

I have a "folder" with an index page.. and on that index page are links to 10 pages (subtopics). This had been up for a couple of months.

then one morning..
At 7:30 am I added 20 more subtopic pages...
and then later that evening...
Every new subtopic (20) were hit by the facebook.com/externalhit_uatext.php (multiple times)

I can't find in my logs a "human" accessing the pages.

There were the typical media type bots (cyberalert and trendiction) but they didn't target those pages uniquely, they crawled a other pages too.

Searching my logs... no human had hit my index page or the subsequent new 20 pages I had posted.

I didn't think Facebook had a search engine type bot... just the link verification tool they use.

So did somebody use the/a media scraper trendiction.de to post to Facebook some where I can't find ?
Is this a malicious ?

I never saw this before.

lucy24

7:55 pm on Apr 27, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You forgot to give the IP. Although Googlebot is by far the most commonly spoofed, it doesn't hold a monopoly.

kahuna

12:38 am on Apr 28, 2014 (gmt 0)

10+ Year Member



Thanks for your message...

I don't think this will tell you much... but here are the bots that came by... and then the flury of Facebook "external" bots..
Many many FB "bot" hits..
----------------------------------
crawl-66-249-79-153.googlebot.com - - [24/Apr/2014:07:20:41 -0500]
crawl-66-249-79-185.googlebot.com - - [24/Apr/2014:07:21:30 -0500]
crawl-66-249-79-121.googlebot.com - - [24/Apr/2014:07:21:48 -0500]

75.98.9.249 - - [24/Apr/2014:07:37:06 -0500] (compatible; NetSeer crawler/2.0; +http://www.netseer.com/crawler.html; crawler@netseer.com)
msnbot-131-253-24-80.search.msn.com - - [24/Apr/2014:10:04:01 -0500]
msnbot-131-253-24-94.search.msn.com - - [24/Apr/2014:10:13:01 -0500]
msnbot-65-55-213-42.search.msn.com - - [24/Apr/2014:10:39:20 -0500]
msnbot-131-253-24-47.search.msn.com - - [24/Apr/2014:11:20:49 -0500]

netdisk.cyberalert.com - - [24/Apr/2014:20:48:53 -0500] "GET /xxxxxxxxx/index.shtml HTTP/1.1" 200 10935 "-" "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)"
-----------------------
    than it goes down through the 20 new pages I uploaded this day...

netdisk.cyberalert.com - - [24/Apr/2014:20:49:27 -0500] "GET /xxxxxxxx/yyyyyyyyyyyy.htm HTTP/1.1" 200 17491 "-" "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)"
----------------------
    and the same for this... through the 20 new pages I uploaded this day...

p16n11.trendiction.de - - [24/Apr/2014:23:31:26 -0500] "GET /xxxxxxxxx/yyyyyyy.htm/ HTTP/1.1" 200 10935 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; [trendiction.de...] please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/3.0.0.11"

    AND then I get the massive flurry of Facebook hits...

173.252.100.113 - - [24/Apr/2014:23:32:08 -0500] "GET /oxxxxx/iyyyy.htm HTTP/1.1" 200 17746 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.100.116 and the numerous Facebook hits with the --- HTTP/1.1" 206 76383 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.100.112
173.252.100.119 and more of the same... for each of the individual new "20" pages
69.171.248.0
69.171.248.1
69.171.248.3 and more of the same.... or each of the individual new "20" pages

===========================

not2easy

5:22 am on Apr 28, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



The facebook crawler seems to hit sites as they please. With all the other bots crawling there is no telling where the FB bot found those links. Those IPs are definitely the Facebook crawler. It does not mean anyone has posted the links on Facebook. Since msn and google crawled earlier the same day and found your new content, the others my have noticed new links in their rounds.

kahuna

7:26 pm on Apr 28, 2014 (gmt 0)

10+ Year Member



Thanks group.

I still don't "get" this situation.

I was on the understanding that Facebook really didn't have it's own crawler/bot...
Except the Link Checker that we see facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

And that only occurs when someone posts a link on their Facebook page.

I've tested it... and I'm sure you guys/gals already know this.

I've never seen Facebook do a crawl of my site, like other bots. Only the traffic when somebody posts a link to my site.

Thanks for your comments and taking the time to post.
K.

keyplyr

11:18 pm on Apr 30, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





I was on the understanding that Facebook really didn't have it's own crawler/bot...
Except the Link Checker that we see facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

That's correct.

However since it uses an image from your page in the link, it will periodically check to see that the image files still exist. It will usually grab several from each linked page, then offer them to the poster to choose from. This may amount to lots of hits bunched together in a short amount of time. Then the FB bot will come back in the future and do the same thing as FB uses continue to follow the link to your site.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month