homepage Welcome to WebmasterWorld Guest from 54.242.231.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 63 message thread spans 3 pages: 63 ( [1] 2 3 > >     
Thousands of Spambot IPs Hitting my Site
Thousands of Spambot IPs Hitting my Site
spiritualseo




msg:4396671
 6:19 pm on Dec 11, 2011 (gmt 0)

Hey guys, need someone's help here. From the past week or so the bandwidth usage of my site increased from 1GB a month to 12GB a Day!

Awstats indicates that there are a range of unique IPs hitting my site and requesting thousands of pages with every visit. Most of these IPs seem to be originating from within the US which is funny. I have blocked China, Brazil and some other countries in my HTaccess but the hits continue.

Please take a look at these ips, these are just a few from thousands that hit my site almost every second. And each one requests around 1000 pages. My site has only around 100 pages so perhaps they request a page over and over again:

24.181.178.3
216.6.134.27
70.182.254.242
99.98.188.110
75.134.95.208
65.35.111.110
71.197.69.88
124.123.51.38
173.198.98.134
198.138.135.123
24.159.55.211
50.40.131.171
174.16.100.103
74.131.129.17
75.94.108.222
98.218.136.190
69.244.107.77
68.185.252.101

The funny thing is that all of them look unique and all of them have a verified DNS. How can this be?

Can anyone please explain what is happening? And what can I possibly do to stop this? If this continues, my site will go offline within a week or so.

 

dstiles




msg:4396734
 9:34 pm on Dec 11, 2011 (gmt 0)

Botnets are very active at the moment, at least, on my server they are. Most botnet hits are unique - they switch IPs per page (in my experience).

And yes, a vast number of them come from USA, either from compromised "broadband" computers or from compromised server farms.

Also getting a lot of gootkit scans which look for a lot of specific PHP URLs (blocked a full /16 Egyptian range yesterday!) but although gootkit is a botnet "device" it seems to stay on a single IP until it gets fed up or firewalled.

Look through other recent postings in this forum for other comments on these phenomena.

wilderness




msg:4396745
 10:00 pm on Dec 11, 2011 (gmt 0)

I just realized that most requests are direct and have no referrer. The referrer is blank. I used the following in my htaccess as of now and the spam seems to have completely stopped:

RewriteCond %{HTTP:Accept-Language} ^$ [OR]
RewriteCond %{HTTP_REFERER} ^$
RewriteRule .* - [F,L]

I know this will block quite a few legit users as well but is there any other solution to this? Also will this hinder search engine bots from crawling my site?

I also noticed that all these bots are requesting only one page on my site. This page is the largest page my site has. Is it possible to apply this htaccess rule to this single page alone and not the whole site?

Some of the most consistent UAs are as follows:

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; FunWebProducts)

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0) w:PACBHO60

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; FunWebProducts; GTB7.0; SLCC2; .NET CLR 2.0.50727; .NET CL

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; BTRS28059; SearchToolbar 1.2; GTB7.2; SLCC2; .NET CLR 2.0.50727;

I am not exactly sure if referring pages and requested pages are the same, but I think they are.

PS: the last three UAs are truncated. I am not able to copy the full text for some reason.

wilderness




msg:4396752
 10:11 pm on Dec 11, 2011 (gmt 0)

You have located a solution to stop-the-bleeding, now you should spend some time in learning to understand htaccess and how to implement methods which will prevent other types of abuses to your site (s) in the feature.

As far as the innocents on refer?
Generally speaking its a bad idea to block blank refers.
Blank UA's on the other-hand are required blocking.

1) If you wish to make exceptions for IP's on your refer rule, just add IP lines to the conditions.

RewriteCond %{HTTP:Accept-Language} ^$ [OR]
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{REMOTE_ADDR} ^123\.456\. [OR]
RewriteCond %{REMOTE_ADDR} ^456\.567\. [OR]
RewriteCond %{REMOTE_ADDR} ^567\.678\.
RewriteRule .* - [F]

1) Use the "[OR]" for multiple lines, and do NOT neglect omitting the [OR] from the last IP line.
2) Once you have this band-aid in place, than you'll need to go back and learn how to use regex for IP-Ranges as opposed to specific IP's.

tangor




msg:4396772
 11:03 pm on Dec 11, 2011 (gmt 0)

Do a quick check to see how many are from Amazon... look at this thread: [webmasterworld.com...] and this thread: [webmasterworld.com...]

lucy24




msg:4396781
 11:23 pm on Dec 11, 2011 (gmt 0)

Can you explain about the "Accept-Language" element? I'm used to seeing it in discussions of redirecting visitors to language-specific pages. What's its role in robot blocking?

Incidentally, I recently added a "Files" exemption for favicon.ico. I did it to make it easier to identify humans in log-processing. (Robots who ask for the favicon do exist, but they're rare, so it alerts me to places where I may have blocked too enthusiastically.) It has the side effect of letting in g###'s faviconbot even though they refuse to put clothes on it :)

wilderness




msg:4396785
 11:40 pm on Dec 11, 2011 (gmt 0)

Can you explain about the "Accept-Language"


lucy,
Don't believe it's relevant, rather just something that was included in the lines (i. e., thread) that he located and copied and pasted from.

wilderness




msg:4396786
 11:44 pm on Dec 11, 2011 (gmt 0)

Do a quick check to see how many are from Amazon...


tangor,
Think your biting off more than your able to chew here.
Please see his initial thread in the Apache forum.

Your inquiry might result in enough IP's to keep you and everybody else (here at SSID) in IP's and IP-recognition for some while.

tangor




msg:4396792
 12:06 am on Dec 12, 2011 (gmt 0)

tangor,
Think your biting off more than your able to chew here.
Please see his initial thread in the Apache forum.


Dumping AmazonAWS into .htaccess has one line... little chewing involved. :)

Pick and choose battles, of course, but if one is facing real money costs in exceeding bandwidth, one responds as necessary.

wilderness




msg:4396793
 12:10 am on Dec 12, 2011 (gmt 0)

tangor,
I was referring to his huge list of accumulated IP's.

tangor




msg:4396800
 12:27 am on Dec 12, 2011 (gmt 0)

wilderness... so was I! :) Some server farms are just icky and it is far easier to deal with a group ban than fiddle-pharting with narrow ranges. Most of my ip bans are either nnn or nnn.nnn I think I might have two which are nnn.nnn.nnn

Should traffic (human) fall off, I can always undo any of those ranges or fine-tune them, but for the most part I haven't had to do that. Again, pick and choose which battles. What works for me may not work for any one else.

spiritualseo




msg:4397122
 5:12 pm on Dec 12, 2011 (gmt 0)

I did check for the amazon thing, but I am not able to find any. I am not able to establish any kind of redundancy in the IPs or in the User Agent for that matter. All of them seem unique. This has made it very difficult for me to block anything with certainty except for blocking empty referrals.

Here are a few user agents that I got hit from recently as I removed my 'blank referrer' block for a few seconds:



Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB7.2; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30618; WinNT-PAI 26.08.2009; AskTbLMW2/5.13.2.19379)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.24) Gecko/20111103 Firefox/3.6.24 GTB7.1

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; BO1IE8_v1;ENUS; InfoPath.1)

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E; MS-RTC EA 2)

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SearchToolbar 1.2; GTB7.2; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; .NET4.0C; AskTbPF/5.13.1.18107)

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30618)

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; WinNT-PAI 13.07.2009; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET4.0E; BRI/2)

spiritualseo




msg:4397133
 5:25 pm on Dec 12, 2011 (gmt 0)

One thing I noticed is that this bot is requesting only one page on my site. Would it be possible to apply the 'block referrer' rule to this one page alone? This page is located inside a sub-folder.

Will something like this work, provided that I want to apply this rule to the page:

http://www.example.com/foldername/page.php

<IfModule mod_rewrite.c>
#Options +FollowSymlinks
RewriteEngine On
RewriteBase /
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^$
RewriteRule ^foldername/page\.php$ - [F]
</IfModule>

lucy24




msg:4397138
 5:43 pm on Dec 12, 2011 (gmt 0)

You can apply any RewriteRule to a single page. In fact it's good for your site, because then mod_rewrite doesn't have to slow down and evaluate the rule plus conditions for every single request it receives.

And you can throw out the <IfModule container. You're dealing with a specific site, right? So either you've got mod_rewrite or (shudder) you haven't. And, whoops, you only need to turn the RewriteEngine on once. And / is the default base so you don't need that.

Express the Referer as ^-?$ because null referers (also blank UAs) generally come through as a single -

spiritualseo




msg:4397142
 5:58 pm on Dec 12, 2011 (gmt 0)

Thanks Lucy24, so will this be it:

#Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^-?$
RewriteRule ^foldername/page\.php$ - [F,L]

By the way, will blocking blank referrers also blog genuine bots like the Googlebot? I tried fetching my site under Google webmaster central but Googlebot could not fetch the page. It was 403 forbidden. Is that an indication that Google cannot crawl my site?

agent_x




msg:4397168
 6:55 pm on Dec 12, 2011 (gmt 0)

It will indeed, Googlebot very seldom presents a referrer string.

incrediBILL




msg:4397170
 7:07 pm on Dec 12, 2011 (gmt 0)

One thing I noticed is that this bot is requesting only one page on my site.


If you're on Linux and have SSH access, this quick command will generate a nice temporary IP block list in seconds that you can drop in .htaccess until they go away.

grep "/attacked_page.php" -i access_log | egrep -o "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]" | sort | uniq | awk '{print "deny "$1}'

That'll spit out a nice deny list you can drop in .htacess and shut down the known bad IPs while you try to figure the problem out and get it sorted.

henry0




msg:4397225
 9:35 pm on Dec 12, 2011 (gmt 0)

@incrediBill
Do you ref to spiritualseo problem, or are you generally speaking?
thanks.

Chico_Loco




msg:4397229
 9:42 pm on Dec 12, 2011 (gmt 0)

incrediBILL,

Won't that command also assemble IPs of users that navigate to some other page on the site from "/attacked_page.php", assuming the HTTP_REFERER is being sent/tracked?

Perhaps: grep "GET /attacked_page.php" ?

incrediBILL




msg:4397264
 12:50 am on Dec 13, 2011 (gmt 0)

Perhaps: grep "GET /attacked_page.php" ?


You're right, that's safer!

I just hammered out that line on my way out the door, forgive me ;)

Do you ref to spiritualseo problem, or are you generally speaking?


I use that approach any time I need a quick list of IPs out of access_log

In this case, he has an immediate need to stop a group hitting a certain page, works like a charm. If they were crawling all over the site, wouldn't work so well.

Hope_Fowl




msg:4397280
 2:15 am on Dec 13, 2011 (gmt 0)

Maybe you should hide your site behind CloudFlare.com because they block spammers. You do have to tinker with log if you want to track IPs (they have a WordPress plugin).

keyplyr




msg:4397283
 2:38 am on Dec 13, 2011 (gmt 0)

you should hide your site behind CloudFlare.com

Exactly why many of us block them. Stop hiding.

ns1.cloudflare.com
173.245.48.0 - 173.245.63.255
173.245.48.0/20

Hope_Fowl




msg:4397284
 2:48 am on Dec 13, 2011 (gmt 0)

you should hide your site behind CloudFlare.com

Exactly why many of us block them. Stop hiding.


What good does it do to block a service which blocks incoming spammers? Your server isn't likely to be bothered by CloudFlare's talking to the web servers with CloudFlare clients' content. Unless one of their clients is stealing your images.

keyplyr




msg:4397286
 2:53 am on Dec 13, 2011 (gmt 0)

What good does it do to block a service which blocks incoming spammers? Your server isn't likely to be bothered by CloudFlare's talking to the web servers with CloudFlare clients' content. Unless one of their clients is stealing your images.

It's got nothing to do with "blocking spammers" feature of CloudFlare. It's the unaccountability of cloud computing in general. It attracts lots of nefarious agents who wish to hide behind the cloud as they cause mayhem on our servers. Do some reading. Lots of documentation here in this forum and others at WW.

Note: This is getting off topic. Please start a new thread to continue.

Hope_Fowl




msg:4397287
 2:56 am on Dec 13, 2011 (gmt 0)

How does one use the CloudFlare CDN for cloud computing? Maybe I need to learn how to do arithmetic with DNS servers.

Hope_Fowl




msg:4397538
 7:53 pm on Dec 13, 2011 (gmt 0)

OK, back to the original topic.

Maybe you should hide your site behind CloudFlare.com because they block spammers. You do have to tinker with log if you want to track IPs (they have a WordPress plugin).


The free CloudFlare CDN service hides web servers from spambots, so is another possible solution to the original poster's problem of bandwidth consumption.

Seb7




msg:4397734
 10:33 am on Dec 14, 2011 (gmt 0)

I had this last year on an IIS server, 1000s of the same page request every second from exactly 10 random IP addresses at a time.

Getting the server not to send a page helps. If you can get the server not reply at all (at a TCP/IP level) makes quite a difference on the server load.

The bot I had, I descovered it would action a 301 redirection, so redirected it to its own IP, which reduced the server load to practactly zero.

These sort of bots are getting very common, any large website needs some sort of load protection against this sort of activity.

Since then, I now slow or block any IPs which have gone over the threshold above being a very active user. If a single user is making your site unresponsive, then dont let all your other users suffer.

lucy24




msg:4397748
 11:05 am on Dec 14, 2011 (gmt 0)

The bot I had, I discovered it would action a 301 redirection, so redirected it to its own IP, which reduced the server load to practically zero.

Ooh, must try that. Generally I just send 'em to 127.0.0.1 if they put me in a bad mood.

Quick look at htaccess suggests that a lot of bots have been making me grumpy in recent months.

spiritualseo




msg:4397804
 2:43 pm on Dec 14, 2011 (gmt 0)

I don't have SSH access but thanks for the suggestion. I am wondering if there is any way one can automatically block IPs (on Apache/Linux servers) that access a large number of files in seconds clearly indicating that they are not regular users or legit bots for that matter?

Sgt_Kickaxe




msg:4397848
 4:20 pm on Dec 14, 2011 (gmt 0)

Attempting to trap and block offenders will always be an ongoing game that you are forever in the catching up phase. Get pro-active about limiting potential damage. The best way to do that is to minimize the impact a botnet can have by reducing the overall size of your pages and serving up proper cache versions as required.

A side benefit to this is that your site will load faster, always a good thing.

This 63 message thread spans 3 pages: 63 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved