Forum Moderators: open

Message Too Old, No Replies

Increase in Facebook bot hits

         

dstiles

9:30 am on Apr 28, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is anyone else seeing an increase in facebook bot activity? I usually get a few per day but yesterday I got 1500 and they are still coming! Most of yesterday's were to one site, as far as I can tell, to diferent pages. Today has seen an expansion into other sites, quite possibly from links discovered in the first site.

Is Facebook building up to a proper search engine?

lucy24

5:06 pm on Apr 28, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



YES ... to the point where, earlier this year, I simply blocked Facebook. (They ask for, but ignore, robots.txt.) The excessive requests--as in, every single file on the whole site, over and over again--started somewhere around last November.

:: quick run to logs, because I’ve been ignoring them since March ::

The sheer over-the-top number of hits seems to have let up in mid-February--guess they've moved on to other people's sites--but they continue to ask-and-ignore, prompting an obvious “What part of
User-Agent: facebookexternalhit
Disallow: /
didn’t you understand?”

Final log search in the form /directoryname/\w+/ (space) reveals that, whether blocked or not, they like to request the same page 4-6 times in a single day, and then come back later to request a different page.

Edit: My YES refers to the first question. The answer to the last question must be conjectural, though it did occur to me too.

not2easy

5:32 pm on Apr 28, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Just about a week ago FB announced their new Llama 3 LLM AI/LLM, so it may need wider training. They are adding it into WhatsApp and Messenger, instagram and FB - and offering their api on various platforms.
[webmasterworld.com...]

dstiles

10:22 am on Apr 30, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, both. No problem with it, just wondeing. :)

mrgood

1:46 pm on May 6, 2024 (gmt 0)



From what I see in logs - Facebook is slurping whole sites. The problem is that they use the sane user agent string "facebookexternalhit/1.1" which is used when somene posts link on facebook. Now this creates dilemma - one who wants to get proper shares on facebook cannot block this new search engine from facebook based on user agent string. And I suppose data is used to feed their AI models. So Facebook ignores whatever good practices are on the web because it just can.

SumGuy

1:45 pm on May 12, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



I've been blocking fecebook IP's for a long time, I see no reason to allow that hideous company to access my website. I have noticed an increase in activity over the past month or two.

dstiles

8:09 am on Jun 1, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My customers use facebook so it's for them I allow their pages to be collected. It was a bit much, though, about 50,000 hits across a dozen or so sites within a few days.

universenet

11:09 am on Sep 8, 2024 (gmt 0)

Top Contributors Of The Month



I just saw that one of my websites got arround 250 000 hits from facebookexternalhit, if somone put some link from facebook not usre why so many hits back to website (20GB)

jmccormac

9:05 pm on Sep 21, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FB may have been using Web content to train its AI models. It still requests pages despite getting no content.

Regards...jmcc

stephen22

3:43 pm on Oct 23, 2024 (gmt 0)

Top Contributors Of The Month



Hi, new member - I came here because I have been getting bombarded from facebookexternalhit and now meta-externalhit... My forum software is getting crawled by like 250 of these spiders at the moment. If they ignore robots.txt, what is the best way to completely block these two crawlers? Thanks for any help!

lucy24

4:03 pm on Oct 23, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If they ignore robots.txt, what is the best way to completely block these two crawlers?
Are you asking how to block an unwanted visitor? That’s veering into server-specific territory; we have separate subforums for Apache and IIS. FB uses several different IPs, so it’s easiest to block by user-agent. In Apache I say
BrowserMatch facebookexternalhit bad_agent=facebook
which feeds into a general
Require env bad_agent
(within, of course, a RequireNone envelope) in an htaccess file that covers multiple sites. If it’s a single site, you could also do it in mod_rewrite.

stephen22

8:29 pm on Oct 23, 2024 (gmt 0)

Top Contributors Of The Month



That’s veering into server-specific territory;

Thanks for the reply! I'm not really much of a developer...

I run on Apache and have a single site to protect. Can you give me a specific example of the code to put in my .htaccess to do what you are talking about?

lucy24

8:47 pm on Oct 23, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's continue this discussion in the Apache subforum.

stephen22

8:53 pm on Oct 23, 2024 (gmt 0)

Top Contributors Of The Month



OK, I'll start a new thread.