Forum Moderators: open
User-Agent: facebookexternalhit
Disallow: /
didn’t you understand?”
If they ignore robots.txt, what is the best way to completely block these two crawlers?Are you asking how to block an unwanted visitor? That’s veering into server-specific territory; we have separate subforums for Apache and IIS. FB uses several different IPs, so it’s easiest to block by user-agent. In Apache I say
BrowserMatch facebookexternalhit bad_agent=facebook
which feeds into a general Require env bad_agent
(within, of course, a RequireNone envelope) in an htaccess file that covers multiple sites. If it’s a single site, you could also do it in mod_rewrite.