Welcome to WebmasterWorld Guest from 54.146.195.24

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Banning facebookexternalhit bot

facebookexternalhit

     
5:08 am on May 27, 2014 (gmt 0)

New User

5+ Year Member

joined:Oct 31, 2013
posts:8
votes: 0


Hello,

I want to ban Facebook because the traffic I get from Facebook is worst than useless to me. In fact it's just a source of major problems. (Bandwidth, hotlinking, trolls, spam, copyright, etc. etc. etc.)

So banning facebook is not easy and I get ignored when I try and contact them for help. I'm NOT very good at .htaccess therefore I need a little help

This is what I have so far

1st. I attack the user agent

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit.*$ [OR]

2nd I ban the user agent
RewriteRule ^forbid/(.*)$ / [R=403,L]

Obviously I'm groaping around in the dark here because I add these lines I get a 500 error
6:16 am on May 27, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15247
votes: 691


You've got too many anchors. Just
facebookexternalhit
without any anchors. No [OR] unless it's one of several conditions in some pre-existing rule. If you append [OR] to the last condition in a list-- i.e. to the only condition, if there's just one-- that's your 500 error.

R=403 is not technically wrong-- that is, it will have the desired effect-- but it looks silly. That's what the [F] flag is for. An [F] carries an implied [L] (so does R=403, or any R outside the 3xx range) so you don't need the [L] either. Again, it won't hurt, it just isn't needed.

RewriteRule ^forbid/(.*)$ / [R=403,L]

This rule would apply only to requests for material in the /forbid/ directory. Have you got one? What's special about it?
6:31 am on May 27, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4054
votes: 249


Lucy is right, the [OR] means "or" and is only used if you have several lines of UAs in a list. If you want the bot to be booted from the entire site and not just the one ddirectory, the rule would be a short, simple:
RewriteRule .* - [F]
6:55 am on May 27, 2014 (gmt 0)

New User

5+ Year Member

joined:Oct 31, 2013
posts:8
votes: 0


Ok I think I'm getting it.

So I'm going to put it together something like this :

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit.*$
RewriteRule ^forbid/(.*)$ / [R=403,L]

As for the forbid directory.
I'm not sure I'm understanding what you mean. Do I need a special direcotry? Because I'm just forbiding the entire site to facebook.
7:10 am on May 27, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4054
votes: 249


No, the anchors are still there and you have said it is NOT just for a single directory named /forbid/ so it is:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit
RewriteRule .* - [F]
8:23 am on May 27, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15247
votes: 691


As for the forbid directory. I'm not sure I'm understanding what you mean.

And I'm not understanding what you mean. If you're excluding the entire site, why are you constraining the rule to just one directory? Does the directory even exist?

I suspect you're confused about the structure of a RewriteRule, and possibly also about how Regular Expressions work. There are four pieces, separated by blank space (the space acts as punctuation):

RewriteRule .? - [F]

is structurally the same as
RewriteRule ^blahblah http://example.com/otherblahblah.html [R=301,L]


#1 "RewriteRule": this part says what will be happening in the rest of the line. The other possible content is "RewriteCond", leading to a different set of pieces.
#2 "pattern" = if a request matches this pattern, evaluate the Conditions, if any; if all conditions are met, then apply the rule. The form .? means "all requests of all kinds, including requests for the root".
#3 "target" = take this action if the pattern fits and any conditions are met. A - means no changes.
#4 optional "flags" = extra information that mod_rewrite uses.

If you're messing about with mod_rewrite, you'll need to learn some basics about Regular Expressions. They're a powerful tool, but you can seriously hurt yourself with a carelessly applied regex.

Edit:
Going back to the first post
1st. I attack the user agent

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit.*$ [OR]

2nd I ban the user agent
RewriteRule ^forbid/(.*)$ / [R=403,L]

Yup, some misunderstandings there. A RewriteCond isn't a separate animal. It belongs to the immediately following RewriteRule. Each rule can (optionally) have one or more conditions. The whole package-- rule with preceding conditions, if any-- is called a ruleset.

Here you have a rule:
"if the request is such-and-such, deny the request"
with preceding condition:
"take this action if the user-agent is facebookexternalhit".

Unlike most unwanted visitors, facebook rules have to cover all requests; it isn't enough to write a rule for requests in .html. Especially not if they already know your image files exist. You might, however, consider an alternative route:

RewriteRule \.(jpg|png|gif) /pictures/onedot.gif [L]

where "onedot.gif" is a single-pixel transparent gif that you've made for this purpose. (It can also be used for other things; I call it an administrative gif.) This is less work for the server than sending out the full 403 response. And it's just as effective, because they never get hold of a real picture that people can then click on or hotlink to.

Final caution: sometimes they'll pull a different user-agent. Lately I've found a few "visionutils/0.2"-- so far, always from the 173.252. range-- mixed in with the two versions of facebookexternalhit.
3:19 pm on May 27, 2014 (gmt 0)

New User

5+ Year Member

joined:Oct 31, 2013
posts:8
votes: 0


Oh lord this stuff is fudge complicated.

Ok how about this...
I have a lot less problems with simple order deny and banning IP addresses. Is there a place where I can find all of Facebook's IPs and just ban them all?
3:46 pm on May 27, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4054
votes: 249


Sure - at the top of the page here, there is a search link. Go there and try a term like facebookexternalhit or Facebook bot and you can find what other people do to deal with the bot via IP.
7:41 pm on May 27, 2014 (gmt 0)

New User

5+ Year Member

joined:Oct 31, 2013
posts:8
votes: 0


I found the IPs for facebook's bot

I'll be back when I work out the .htacces file
7:52 pm on May 27, 2014 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Look in the spiders forum, there will probably be a list of them there. When you get an IP from facebook do WHOIS on the IP and get a range of them.

What I have on FB is 2 user agents:

facebookexternalhit
Facebook share follower

and these IP ranges:

66.220.144.0-66.220.159.255
69.171.224.0-69.171.255.255
69.63.176.0-69.63.191.255

I don't even block Facebook because they're only doing link checks for content or links to your site.

It's free Social Media traffic potential and if you're getting so much (define much) that it's driving you to block it, I'd check the referrers and see if Facebook is sending you any traffic first.

If they're hotlinking images, they probably copied them already based on what I know of how facebook works so you'll only possibly stop future hotlinks, unless the image is copied via the browser and not the server, then you're out of luck.

Personally, I would welcome all the social media attention, it's free branding, unless you didn't put your domain in the images, etc.

Instead of blocking facebook, turn it to your advantage.

Either way, good luck!
9:32 pm on May 27, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15247
votes: 691


and these IP ranges:

66.220.144.0-66.220.159.255
69.171.224.0-69.171.255.255
69.63.176.0-69.63.191.255

Also
173.252.103.x
Technically the range is
173.252.64.0/18
but I've only ever seen .103. in crawling. Conversely, I haven't seen the 69.63. range since 2012. A slightly annoying quirk of fb is that a single visit, to a single page-plus-images, varies randomly among the three IP ranges. (You see the same thing in the plainclothes bingbot.)

It's free Social Media traffic potential

Sure, if you've got a site that's attractive to social-media users. Otherwise it's just an unwanted hotlink. How often do humans look at an image, say "Ooh, that must be an interesting site", and physically type in the domain name from the watermark? Doesn't happen.

I don't think I've ever met a facebook spoofer. I haven't blocked the IP, though, because there are a couple of directories I want to leave open.
3:16 am on May 29, 2014 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I haven't seen the 69.63. range since 2012


You might be right, the site where I got those hasn't been visited in a while.
8:12 pm on June 17, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2040
votes: 1


From my notes, circa January, 2014:

IPs
69.63.189.247
69.63.189.248

UAs (exact)
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Mozilla/5.0 (compatible; FriendFeedBot/0.1; +Http://friendfeed.com/about/bot)

69.63. may have been around since -- I've not noticed. Will keep an eye out.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members