Welcome to WebmasterWorld Guest from 35.153.73.72

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

How to block .ru traffic while allowing google.ru and pinterest.ru?

     
6:24 am on Dec 1, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 24, 2003
posts: 741
votes: 74


My site is getting bombarded with referral traffic from .ru and .ua spam sites and I used this code in .htaccess to block it completely, The problem is that this also blocks all traffic from Google.ru and Pinterest.ru, which I would like to avoid. Is there a way to add excepted sites to the rules?

## Deny RU and UA traffic
RewriteCond %{HTTP_REFERER} \.(ru|ua)(/|$) [NC]
RewriteRule .* - [F]
ErrorDocument 403 "Access Denied"
6:34 am on Dec 1, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member graeme_p is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 16, 2005
posts:2950
votes: 192


I think adding another RewriteCond with a negated rule that matches the sites you want to except will work. Prefix the rule with an exclamation mark to negate.

[httpd.apache.org...]
7:03 am on Dec 1, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


As graeme_p suggested:

RewriteCond %{HTTP_REFERER} \.(ru|ua)
RewriteCond %{HTTP_REFERER} !(google|pinterest)
RewriteRule .* - [F]

You do know of course, perps from Russia & Ukraine can lease servers from anywhere in the world.
5:50 pm on Dec 1, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 24, 2003
posts: 741
votes: 74


Thank you that seems to have worked fine! I am blocking everything except about ten of the most popular russian sites. Does anyone have an idea of why these sites refer traffic, what is the goal? Months ago I found that my Cpanel account had been hacked and there was a list of russian sites entered into the list of sites excluded from hotlinking on my site. I had to disable the hotlink feature in cpanel entirely.

My hosting company said it was probably an htaccess code injection hack, but their scan and all my previous scans of the site files and plugins revealed no hacks. (It's a wordpress site). The original list of russian sites entered into my cpanel hotlink panel matched the referral traffic sites, but I banned those long ago and the number of sites keeps multiplying. Would that indicate that there might still be a back door somewhere on my site? My hosting company says that it's typically in one of the htaccess files.
8:31 pm on Dec 1, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


RewriteCond %{HTTP_REFERER} \.(ru|ua)

That's potentially precarious with no anchor. I'd say
\.(ua|ru)(/|$)
just to exclude the rare legitimate sites that might have a literal . elsewhere in the URL. I used to have a very similar rule; mine went
RewriteCond %{HTTP_REFERER} \.(ru|ua)(/|$)
RewriteCond %{HTTP_REFERER} !(google|yandex|\.mail)\.
RewriteRule (^|\.html|/)$ - [F]
It's "used to" because after I went to header-based lockouts, this particular rule was no longer needed.

My hosting company says that it's typically in one of the htaccess files.

That seems a safe guess, though not very helpful. Check the timestamp on your htaccess file periodically, even if you don't change it yourself very often. If you use a CMS such as WordPress, the timestamp will change whenever the host moves to a new version--but you'd expect them to give you plenty of advance notice. (And if they either don't tell you, or don't update periodically, it's new-host-shopping time.)
10:18 pm on Dec 1, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 24, 2003
posts: 741
votes: 74


It's "used to" because after I went to header-based lockouts, this particular rule was no longer needed.


Thanks Lucy, will update the code. Perhaps you can elaborate on header-based lockouts...is it something easy to set up?
1:38 am on Dec 2, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


That's potentially precarious with no anchor... just to exclude the rare legitimate sites that might have a literal . elsewhere in the URL...
In this case I kept the rule as simple as possible.
7:49 pm on Dec 2, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


Perhaps you can elaborate on header-based lockouts...is it something easy to set up?

Without going into too much detail (on the admittedly far-fetched chance that botrunners will universally reprogram their robots to make them look human, which they won't because most robots are, frankly, quite stupid):

Part 1:
Set environmental variables based on the existence and/or content of assorted header fields, such as
SetEnvIf Accept ^$ noaccept
(meaning: if the “Accept” header is absent or empty, set this variable)

Part 2:
Unset these same environmental variables for known authorized robots, such as
BrowserMatch Googlebot !noaccept !nolang
(There is a separate rule, elsewhere, for robots that claim to be the Googlebot but come from the wrong IP.)

Part 3:
Block what's left, such as
Deny from env=noaccept

Using this approach, I've almost entirely eliminated old rules in the form
Deny from 11.22.33
with its endless whack-a-mole of having to add IP ranges constantly as new server farms are created or discovered--or delete them as old ranges are taken over by human mobiles.

Old htaccess: 29K. New htaccess: 6K. Some robots do get in--also some botnets from places like Brazil and Russia that run on infected human browsers--but the number is definitely no higher, and probably lower, than the old IP-based system. And it requires much less maintenance.