How to block .ru traffic while allowing google.ru and pinterest.ru?

Forum Moderators: phranque

Message Too Old, No Replies

How to block .ru traffic while allowing google.ru and pinterest.ru?

ichthyous

6:24 am on Dec 1, 2016 (gmt 0)

My site is getting bombarded with referral traffic from .ru and .ua spam sites and I used this code in .htaccess to block it completely, The problem is that this also blocks all traffic from Google.ru and Pinterest.ru, which I would like to avoid. Is there a way to add excepted sites to the rules?

## Deny RU and UA traffic
RewriteCond %{HTTP_REFERER} \.(ru|ua)(/|$) [NC] 
RewriteRule .* - [F] 
ErrorDocument 403 "Access Denied"

graeme_p

6:34 am on Dec 1, 2016 (gmt 0)

I think adding another RewriteCond with a negated rule that matches the sites you want to except will work. Prefix the rule with an exclamation mark to negate.

[httpd.apache.org...]

keyplyr

7:03 am on Dec 1, 2016 (gmt 0)

As graeme_p suggested:


RewriteCond %{HTTP_REFERER} \.(ru|ua) 
RewriteCond %{HTTP_REFERER} !(google|pinterest)
RewriteRule .* - [F]

You do know of course, perps from Russia & Ukraine can lease servers from anywhere in the world.

ichthyous

5:50 pm on Dec 1, 2016 (gmt 0)

Thank you that seems to have worked fine! I am blocking everything except about ten of the most popular russian sites. Does anyone have an idea of why these sites refer traffic, what is the goal? Months ago I found that my Cpanel account had been hacked and there was a list of russian sites entered into the list of sites excluded from hotlinking on my site. I had to disable the hotlink feature in cpanel entirely.

My hosting company said it was probably an htaccess code injection hack, but their scan and all my previous scans of the site files and plugins revealed no hacks. (It's a wordpress site). The original list of russian sites entered into my cpanel hotlink panel matched the referral traffic sites, but I banned those long ago and the number of sites keeps multiplying. Would that indicate that there might still be a back door somewhere on my site? My hosting company says that it's typically in one of the htaccess files.

lucy24

8:31 pm on Dec 1, 2016 (gmt 0)

RewriteCond %{HTTP_REFERER} \.(ru|ua)

That's potentially precarious with no anchor. I'd say
\.(ua|ru)(/|$)
just to exclude the rare legitimate sites that might have a literal . elsewhere in the URL. I used to have a very similar rule; mine went

RewriteCond %{HTTP_REFERER} \.(ru|ua)(/|$)
RewriteCond %{HTTP_REFERER} !(google|yandex|\.mail)\.
RewriteRule (^|\.html|/)$ - [F]

It's "used to" because after I went to header-based lockouts, this particular rule was no longer needed.

My hosting company says that it's typically in one of the htaccess files.

That seems a safe guess, though not very helpful. Check the timestamp on your htaccess file periodically, even if you don't change it yourself very often. If you use a CMS such as WordPress, the timestamp will change whenever the host moves to a new version--but you'd expect them to give you plenty of advance notice. (And if they either don't tell you, or don't update periodically, it's new-host-shopping time.)

ichthyous

10:18 pm on Dec 1, 2016 (gmt 0)

It's "used to" because after I went to header-based lockouts, this particular rule was no longer needed.

Thanks Lucy, will update the code. Perhaps you can elaborate on header-based lockouts...is it something easy to set up?

keyplyr

1:38 am on Dec 2, 2016 (gmt 0)

That's potentially precarious with no anchor... just to exclude the rare legitimate sites that might have a literal . elsewhere in the URL...

In this case I kept the rule as simple as possible.

lucy24

7:49 pm on Dec 2, 2016 (gmt 0)

Perhaps you can elaborate on header-based lockouts...is it something easy to set up?

Without going into too much detail (on the admittedly far-fetched chance that botrunners will universally reprogram their robots to make them look human, which they won't because most robots are, frankly, quite stupid):

Part 1:
Set environmental variables based on the existence and/or content of assorted header fields, such as

SetEnvIf Accept ^$ noaccept

(meaning: if the �Accept� header is absent or empty, set this variable)

Part 2:
Unset these same environmental variables for known authorized robots, such as

BrowserMatch Googlebot !noaccept !nolang

(There is a separate rule, elsewhere, for robots that claim to be the Googlebot but come from the wrong IP.)

Part 3:
Block what's left, such as

Deny from env=noaccept

Using this approach, I've almost entirely eliminated old rules in the form

Deny from 11.22.33

with its endless whack-a-mole of having to add IP ranges constantly as new server farms are created or discovered--or delete them as old ranges are taken over by human mobiles.

Old htaccess: 29K. New htaccess: 6K. Some robots do get in--also some botnets from places like Brazil and Russia that run on infected human browsers--but the number is definitely no higher, and probably lower, than the old IP-based system. And it requires much less maintenance.