Welcome to WebmasterWorld Guest from 35.172.111.215

Forum Moderators: rogerd & travelin cat

Message Too Old, No Replies

Wordpress Sites Under Bot Attack

Dealing with a huge number of bots on a WordPress site

     
6:20 am on Jan 22, 2015 (gmt 0)

New User

Top Contributors Of The Month

joined:Jan 22, 2015
posts: 4
votes: 0


I had a business call me last week because their ste was shut down for the second time in as many months. They wanted me to investigate why and fix it.

I found the site was down because of TOS bandwidth usage violations on their hosting account. The reason was the site was being visited by bots from thousands of unique IPs from all over the world.

The bots were a mix of spam bots and brute force login bots. I beefed up security to stop the commenters (there were already 75,000 spam comments most of which were left in a 3 day period last November). Then started blocking IPs but they just kept coming and coming so I added WordFence plugin and set the access controls pretty tight and even so a lot are getting through but I blocked whole ranges of IPs, thousands of individual addresses.

Then I noticed that a new site I was building for myself was getting similar bot traffic. I had only created on page on the site but thought it would be fun to experiment with buddypress so I had that installed as well.

It was only up for about a week and in that time nearly 500 spammers had registered and created user groups of spam comments. They then took those newly generated spam group pages and distributed them across the web as part of spam comments and article farm and link farm content on hundreds of sites. It is like they are trying to drive traffic to the spam pages they created on my site.

I thought it was interesting to watch and experiment with so I am playing around more with the bot traffic on that site to better understand how they work and how to effectively stop them without hurting site performance or potentially the site's SEO

I am just wondering if anyone would like to talk about this kind of bot attack.
6:35 am on Jan 22, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:June 5, 2014
posts:154
votes: 0


It is like they are trying to drive traffic to the spam pages they created on my site.

And/or link juice(?)

I am very interested, please keep us updated
10:27 am on Jan 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4


Welcome to WebmasterWorld enki09!
Love to talk about WordPress security.

Please do keep us updated.
4:27 pm on Jan 22, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 16, 2010
posts: 3828
votes: 31


Thanks for the news, enki09:

I wonder if it might be a good idea to move your client's site to something like cloudflare where they have pretty good security against bot attacks (at least from what I heard).

Also, at the minimum, it would save SOME bandwidth.
11:44 pm on Jan 22, 2015 (gmt 0)

New User

Top Contributors Of The Month

joined:Jan 22, 2015
posts: 4
votes: 0


Hi,

Thanks for the responses. The client site is on a kind of crappy web host to start with. Some really rudimentary stuff had not been done by the designer when setting up the site so that was the first thing I did to try to deal with the problem. I activated Askimet and deleted the nearly 75,000 existing spam comments and also, since they were using a 3rd party comment system and had no need for wordpress comments I deactivated WordPress comment system and deleted all of the subscribers who had left the spam comments.

So far in January askimet says it has intercepted 75,000 more spam comments.

The traffic was a little harder to deal with. I installed WordFence so I could see the live traffic and selectively block individual IPs or ranges of IPs but that was like trying to plug 1,000 leaks with one cork. Just not very effective.

Then I found out that watching live traffic on WordFence uses up a lot of CPU and the site got shut down again for TOS (haha). So I deleted WordFence and did start to use CloudFlare CDN and it cut down on the traffic some but a lot is still getting through because bots are on on thousands of IP's from all over the world.

It got easier when I found CloudFlare's IP blocking tool that lets you just type in the name of a country and block it that way :-)

IP blocking seems to be the most effective way to get rid of the bots but because it is a botnet a lot of the IPs could be shared so you end up blocking potential real traffic too.

I also added more disallows to the robot.txt file as well as a longer timeout between visits for crawlers but compliance with the robot txt is not required I believe.

I think I have the client site pretty much stabilized so I will end talking about that one and do another reply about my own site because I am more free to play around with that one :-)
9:08 am on Jan 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10567
votes: 1123


Interesting case study! What steps have you don via .htaccess directives (there are several) and have you visited the SSID forum at Webmaster World? [webmasterworld.com...]
12:48 pm on Jan 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4


Good progress enki09. That's about as far as you can reasonably go. Were the bots after a particular page or just hitting anything?
7:32 pm on Jan 23, 2015 (gmt 0)

New User

Top Contributors Of The Month

joined:Jan 22, 2015
posts: 4
votes: 0


tangor - I blocked the semalt and buttons-for-website crawlers in htaccess but that is a somewhat different issue (although I can not help but wonder why Semalt is building their bot army...). IP blocking in htaccess would only work if I blocked ranges of IPs as there were just too many unique ones. I am now subscribed to the SSID forum, thanks :-)

lorax - Thanks. There was a range of bot functions. Pretty much pulling specific pages but some were trying to do brute force admin logins and others were trying backdoor files. The theme used on the site has an existing timthumb vulnerability and the theme does not look like it is going to upgrade to address/fix it.

I didn't notice it as much on this site as I did on my own because I have not been analyzing what the bots were doing as much on this site. However, it seems like they are to some extent driving bot traffic to specific pages on sites where they have left spam comments. Like they are trying to SEO their own spam. I know it sounds weird but on my site they were posting links on link farms and content farms and as comment spam that pointed to the group pages they had created on my site. Again, like they were trying to promote their own spam.
8:48 pm on Jan 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


IP blocking in htaccess would only work if I blocked ranges

?! Of course you have to block ranges. Blocking individual IPs is a pointless waste of time. Rare exceptions if you're being targeted by an infected human machine; those go away after a while and can be cleared.
9:11 pm on Jan 23, 2015 (gmt 0)

New User

Top Contributors Of The Month

joined:Jan 22, 2015
posts: 4
votes: 0


lucy24 - Yes, I meant that blocking enough ranges via the htaccess file can be a pain. On my site I did it though and pretty much blocked the whole world aside from the US and that cut down on the bots nicely although US based bots continue to come.

On the client site I could technically do the same thing since they are a local business and not interested in global traffic. For a lot of sites though, who are interested in a global audience, if you just keep blocking ranges of IPs then you are bound to be blocking potential, real site visitors as well eventually.
5:19 pm on Jan 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4


Brute force login attempts can be stopped cold by only allowing your own IP addy. You can use the following:


<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_URI} ^(.*)?wp-login\.php(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*)?wp-admin$
RewriteCond %{REMOTE_ADDR} !^xxx\.xxx\.(.*)\.(.*)$
RewriteRule ^(.*)$ - [R=403,L]
</IfModule>


where the REMOTE_ADDR line is your IP or whatever portion you want filter down to.
6:08 pm on Jan 24, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4558
votes: 363


if you just keep blocking ranges of IPs then you are bound to be blocking potential, real site visitors as well eventually.
    1. You don't block IPs without knowing who they are; Quite rare to see legitimate visitors come from server farms, colo and hosting server ranges.
    2. IP is not the only way to block unwanted traffic, there is a forum here dedicated to sharing that information: [webmasterworld.com...]
    3. The code above is a good start for limiting any chance of alien login abuse but:
    a. It will not do anything to prevent them from repeated attempts, they will still show in the logs.
    b. It does nothing to address harvesters and spambots.
    c. Wait for a code repair post from lucy24 shortly. The code is not ideal.
8:09 pm on Jan 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


^(.*)?

Ahem, cough-cough.

Wait for a code repair post from lucy24 shortly.

Oh, so now she's a mind reader too :-P
3:32 pm on Jan 25, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4



a. It will not do anything to prevent them from repeated attempts, they will still show in the logs.
b. It does nothing to address harvesters and spambots.
c. Wait for a code repair post from lucy24 shortly. The code is not ideal.


a. My experience is they give up after the code has been added to htaccess if they've hit the login page before. I've never had someone try to find the login page when their IP has been denied this way that hasn't been to it before.

b. It was only intended to protect the login page.

c. ? Excuse my ignorance but what's the issue? It works. :o
4:09 pm on Jan 25, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4558
votes: 363


c. ? Excuse my ignorance but what's the issue? It works. :o
Yes, but it could be more efficient, less work for the server.
BTW, I can show you 40 and more lines an hour all receiving a 403, but still trying. Poorly programmed bots only know how to do one thing. At some point in the future their masters may stop wasting their time, but if the issue is wasting bandwidth, a custom block should consider redirecting to a custom (zero bytes) "goaway.php" or similar page that contains only the 403 header:
<?php
header("HTTP/1.1 403 Forbidden");
?>

Oh, so now she's a mind reader too :-P
Not at all, I see you were participating here and was pretty sure you would object to the unnecessary use of the
<IfModule
envelope and
RewriteEngine on
and the
^(.*)?wp-login\.php(.*)$
at a minimum.
3:19 pm on Jan 26, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4


...unnecessary use of the <IfModule envelope and RewriteEngine on and the ^(.*)?wp-login\.php(.*)$
What would you propose in their place?
5:03 pm on Jan 26, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15934
votes: 887


The line
RewriteEngine on

only occurs once per htaccess, unless you're deliberately switching lines off and on to avoid massive commenting-out.

You don't need an <IfModule> envelope at all, ever. It's a feature of prefabricated htaccess files-- most often from a CMS-- where the author doesn't know anything about the target server, so they have to make sure nothing breaks. Once you're on your own server, you know whether you have a given module or not, and can code accordingly. (Keep the ones inside CMS packages-- only-- because the software may look for them when installing.)

In locutions like
^(.*)
it depends whether you're capturing. Since you're not, simply leave off the whole thing and fast-forward to
wp-login\.php
There's no need to say anything about the rest of the request-- the (.*)$ part-- at all.

The anchors ^ and $ don't have any syntactic meaning. (I'm pretty sure you, lorax, know this, but I've seen questions from people who turned out to think they were essential mod_rewrite punctuation marks.) In patterns, they just mean "at the very beginning of the test string" and "at the very end of the test string" for situations where the exact location of matched text is important.

When you're matching a specific type of content, such as an IP, set up the pattern to match only the desired characters. So
xxx\.xxx\.(.*)\.(.*)
becomes
xxx\.xxx\.\d+\.\d+
Captures consume some infinitesimal part of server resources, so don't use parentheses when you don't need them.

Matter of fact, names like /wp-admin/ always come at a particular point in the request, either first or second. And the first directory name-- if present-- never contain hyphens. So the quoted rule

RewriteCond %{REQUEST_URI} ^(.*)?wp-login\.php(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*)?wp-admin$
RewriteCond %{REMOTE_ADDR} !^xxx\.xxx\.(.*)\.(.*)$
RewriteRule ^(.*)$ - [R=403,L]

could be collapsed to

RewriteCond %{REMOTE_ADDR} !^xxx\.xxx\.\d+\.\d+$
RewriteRule ^(\w+/)?wp-(login|admin) - [F]


although, for that matter, does anyone but the site administrator ever use any file in /wp- ? If not, you could express the pattern as an anchorless
wp-

and that's all.

And, finally: Although [R=403,L] is legal-- you can give any number after R, not just 3xx -- the flag [F] exists for this purpose. It carries an implied [L], for a savings of six bytes total.
5:12 pm on Jan 26, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4558
votes: 363


You don't need an <IfModule> envelope at all, ever.
Very true - except that in WordPress, you DO need that envelope to contain the WP generated snippet of htaccess. Remove at your own peril because that is what WP uses to generate settings changes and without its container it may not do what you intended. I found out the hard way that WP does seem to need that container to identify the part it writes to.
9:58 pm on Jan 26, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7577
votes: 4


Excellent explanation lucy24! Thank you!

And while I happen to know what the ^ and & are for I would say that my general knowledge of Apache & Regex is dangerous at best - as evidenced by the code block you so eloquently and politely tore apart. :)