homepage Welcome to WebmasterWorld Guest from 54.167.10.244
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
botnet with same user agent targeting specific pages
wilderness




msg:4459459
 2:23 pm on May 30, 2012 (gmt 0)

I've a page that has captivated the attention of a botnet.

The IP's are too many too list, and unless somebody is interested, I'm not going to make the effort to edit out my folder-page names.

Most of the IP's are RIPE and APNIC, however there have been a few ARIN (likely compromised machines).

If interested?

Don

 

incrediBILL




msg:4459608
 8:20 pm on May 30, 2012 (gmt 0)

Any chance you can just dump the IPs alone?

How many IPs are we talking about, hundreds? thousands?

wilderness




msg:4459612
 8:36 pm on May 30, 2012 (gmt 0)

Twenty eight, however I'm sure more will appear.
Most of the IP's were 2-3 times.
A handful were repeatedly used a dozen times or more.

24.52.67.6
41.89.211.5
46.38.0.164
46.252.244.178
62.141.34.136
66.189.187.133
67.187.17.28
69.42.126.103
69.175.64.183
72.18.139.244
75.75.52.0
76.73.130.43
78.192.196.117
81.30.174.12
82.41.8.183
90.182.182.154
93.115.204.208
93.99.5.23
110.142.196.145
125.162.254.233
146.185.22.84
173.16.217.84
173.217.38.79
173.78.5.41
174.63.57.194
176.9.209.120
188.227.189.69
189.90.127.24
190.151.111.202
190.116.35.20
190.36.171.143
190.220.250.34
202.182.124.12
202.46.67.195
203.76.106.67
210.211.109.144
210.211.109.147
222.124.191.186

keyplyr




msg:4459637
 10:20 pm on May 30, 2012 (gmt 0)

I'm also currently being hit daily by a botnet (various IPs) using the Googlebot UA string. They're going after only one page, a contact page; ironic since the page only contains images of contact info, no actual email addresses or forms. They're all being 403d but they've been at it all week, sometimes 40-50 per day.

These are from today so far:

78.179.154.110
157.56.95.142
177.38.114.248
189.16.113.242
189.70.60.47
201.33.235.91

(Sorry if I'm hijacking the thread)

Future




msg:4459638
 10:34 pm on May 30, 2012 (gmt 0)

Sorry for being unappropriate here,
but we do find lot of such ip and ip-ranges, regularly for past many weeks/months.

How to keep track of same ?
Our site software limits us identifying bad/good ip/bots as discussed here.

Any suggestions, will help us for survival.

Regards,

wilderness




msg:4459659
 11:14 pm on May 30, 2012 (gmt 0)

keyplr, I just provided a "heads up", and you've done the same.

Future,
You need a pattern to associate the IP's.
In my instance it was easy, as all the IP's requested the same initial page and then immediately followed up with a seciond request for the root.
In addition, the SAME UA's were used by different IP's for each series (groups) of requests.

A pattern.

incrediBILL




msg:4459675
 12:30 am on May 31, 2012 (gmt 0)

Basically the same thing I recently reported here:
[webmasterworld.com...]
[webmasterworld.com...]

Seems if we tracked enough of this nonsense we'd eventually uncover the entire network of IPs and be able to publish a block list that might get the attention of the infected people.

wilderness




msg:4459688
 1:25 am on May 31, 2012 (gmt 0)

Bill and keyplr,
Did your IP's have a few that were more active (quantity) than the others?

Don

incrediBILL




msg:4459693
 1:37 am on May 31, 2012 (gmt 0)

I usually see the IPs hit 1-2 pages max with a few trying the same page 4-5 times per IP.

Got something claiming to be "ia_archiver" relentlessly hammering from China daily with the same IPs all the time. No clue who it really is or what they want, but they seem to be happy munching on their daily dose of 403s.

Wonder if using 410s would work better instead?

keyplyr




msg:4459694
 1:43 am on May 31, 2012 (gmt 0)


I usually see the IPs hit 1-2 pages max with a few trying the same page 4-5 times per IP.

ditto


Got something claiming to be "ia_archiver" relentlessly hammering from China daily with the same IPs all the time.

Saw them a couple weeks ago, then they stopped. No sightings since.

lucy24




msg:4459725
 4:14 am on May 31, 2012 (gmt 0)

I finally took the advice everyone has been giving all along:

RewriteCond %{REMOTE_ADDR} !^66\.249\.
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteRule (\.html|/)$ - [F]

ymmv, but I don't have enough images to make it worth checking every single request. Keep 'em out of the pages, and let the rest fall where it may.

:: not to be confused with the neighboring rule that says "If it DOES come from a Bing URL but DOESN'T identify itself as the Bingbot..." ::

wilderness




msg:4459742
 4:43 am on May 31, 2012 (gmt 0)

lucy,
FWIW, you should change that IP to:

!^66\.249\.(6[4-9]|[78][0-9]|9[0-5])\.

The 0-63 is some kind of host, and of which FAKE gg's have also appeared.

g1smd




msg:4459760
 6:30 am on May 31, 2012 (gmt 0)

but they seem to be happy munching on their daily dose of 403s.
Wonder if using 410s would work better instead?

I serve a random selection of 410, 403, 502, 504 and several other 5xx codes for stuff that should 'go away'.


FWIW, you should change that (Google) IP to:
!^66\.249\.(6[4-9]|[78][0-9]|9[0-5])\.

I have been using:
!^(64\.(68\.[89][0-9]|233\.1[6-9][0-9])|66\.249\.[6-9][0-9]|72\.14\.[12][0-9][0-9]|74\.125|209\.85\.[12][0-9][0-9]|216\.239\.[3-6][0-9])\.
which allows all ranges registered to Google. Guess that needs a bit of further tightening up now.

wilderness




msg:4459901
 1:35 pm on May 31, 2012 (gmt 0)

more IP ranges.
Same page request, and then subsequent root request duplicated by all.

41.220.30.3
64.191.116.19
67.250.27.240
68.41.37.245
68.97.186.191
69.246.159.188
71.231.171.237
71.62.192.163
71.194.3.84
75.131.175.22
76.111.249.166
79.129.17.76
89.201.51.206
98.113.201.5
108.20.152.202
109.163.233.201
109.163.233.205
118.102.27.216
125.16.69.114
142.54.162.2
184.77.135.247
188.227.189.69
200.110.82.230
202.43.183.60
203.62.1.59
210.211.109.143
220.162.14.114

wilderness




msg:4459903
 1:40 pm on May 31, 2012 (gmt 0)

g1smd,
Don't recall when I omitted all the g-tools IP's, however it has been some years, and it has no affected site indexing.

All I allow is the 66.249.64-95.

wilderness




msg:4459912
 2:07 pm on May 31, 2012 (gmt 0)

FWIW,
The first list of IP's that I added did NOT include the first five IP's that began this botnets visits. I added those five IP's into the second group.

The very first request was 203.62.1.59, which was denied based upon both IP and UA.

This latest group is the largest one-day-session of IPs that have attacked, and it's still A.M. here.

dstiles




msg:4460120
 9:15 pm on May 31, 2012 (gmt 0)

It's a waste of time listing botnet IPs. A large proportion of the computers using those IPs will (or should) clean their machines as soon as they realise they are compromised; a new set will, by then, have been compromised.

In addition, IPs will, in many cases, switch from computer to computer on dynamic (DHCP) ISPs, so a compormised IP one day will be clean the next. I appreciate this is not so likely in USA but certainly in many parts of the world.

Apart from that, the number runs into millions per botnet. I doubt any of us have enough patience to list them all. :)

I got the currently prevalent googlebot scan on an HTTPS webmail server this week, as well as on my ordinary sites. All hits failed to fetch anything but it didn't stop them repeating the effort.

incrediBILL




msg:4460136
 9:57 pm on May 31, 2012 (gmt 0)

We don't need the patience, we just need scripts to collect them and block them automatically.

Let the IP owner get themselves off the list when they fix the problem but if they're still infected it'll put them right back on the list.

Fighting automation with automation, it's the only way to even stand a chance.

lucy24




msg:4460167
 10:59 pm on May 31, 2012 (gmt 0)

How many separate IPs can you block before it starts noticeably slowing down your server? Say you keep piling on the 403s until you're blocking something in every other /16. 256 * 128 = over 30,000 lines that the server has to plow through on every single request. How much time will that add?

g1smd




msg:4460168
 11:03 pm on May 31, 2012 (gmt 0)

You don't block this stuff using htaccess, you use some other method to store the list of IP addresses.

One starting point could be AlexK's PHP script with a load of extra features added - limited only by your imagination.

So, why return only 403 to nasty bots? There's a whole range of useful 5xx codes that can also be used.

wilderness




msg:4460197
 12:40 am on Jun 1, 2012 (gmt 0)

It's a waste of time listing botnet IPs


How many separate IPs can you block before it starts noticeably slowing down your server?


Perhaps I may chill the soothsayers ;)

Of the two groups of IP's that I listed, there were two additions to my IP denaials, both of which were of US origin. Additionally both were denied to the Class D, which I don't normally do.

Bill certainly doesn't need me to speak on his behalf, however he white-lists and denies most by default.

wilderness




msg:4460206
 1:21 am on Jun 1, 2012 (gmt 0)

FWIW, many of the us within this group are reviewing raw access logs.
A few of us, have reviewed raw access logs for many years.

A primary flag for myself (utilizing black-listing) has always been that lone page request (absent the pages supporting files).
Generally upon exploring that lone page request's IP and UA, and additionally adding into the mixup complied references, we/I should be capable of making a decision whether that lone page request was an automated request or a valid visitor.

If automated?
Than a denial is likely added and NOT restricted to the Class D range.

The result is that over time, the door has been shut on many of these pests before they even get started.

keyplyr




msg:4460222
 2:41 am on Jun 1, 2012 (gmt 0)

How many separate IPs can you block before it starts noticeably slowing down your server?
Say you keep piling on the 403s until you're blocking something in every other /16. 256 * 128 = over 30,000 lines that the server has to plow through on every single request. How much time will that add?

Not so much the sheer number of IPs or the number of lines, but also the method used. The easiest (fastest) on *my* Apache server for denying IPs is: mod_authz_host. The resource hog is mode_rewrite which often reads a single line many times to match variables. I use sparingly and only for blocking a few UAs and filtering IPs for bots.

Also, it is wise to have a linear strategy. Put you blocks up first to diminish what has to be processed later. In other words, block the bad guys (by whatever means) prior to the rewrites, redirects, parameter resolves, error directives, etc.

g1smd




msg:4460269
 6:38 am on Jun 1, 2012 (gmt 0)

The htaccess file is processed in "per module" order. Group the rules "by module name" so that they are clear to you the reader. The overall order is not so important because the whole file is read by each module for each request (exception, the [L] flag in mod_rewrite rules halts processing).

Within the rules that mod_rewrite uses you should order them by "rules that block access" (as it is pointless to redirect a request only to then block it) followed by "rules that externally redirect" (redirects must always be before rewrites to avoid exposing a previously rewritten internal path back out on to the web as a new URL) followed by "rules that internally rewrite". Within each of those blocks order the rules from "most specific" to "most general".

lucy24




msg:4460509
 7:05 pm on Jun 1, 2012 (gmt 0)

block the bad guys (by whatever means) prior to the rewrites, redirects, parameter resolves, error directives, etc.

Problem is, the cleanest and simplest way to block is by CIDR numbers. That's core, so it comes after everything else.

Does it make any difference if you put your most-likely-to-be-blocked ranges first, or does the server still have to read the whole rest of the list even after it has met a "Deny from" directive?

dstiles




msg:4460522
 7:20 pm on Jun 1, 2012 (gmt 0)

Bill - my observations on required patience was in listing the IPs on this forum, not in blocking them on the server. I do that anyway.

Lucy - I used to add blocked IPs to a text file until it became far too large - thousands of the little B's taking ages to load and process. Nowadays I add them to a MySQL database which returns a yes/no based on that and on UAs, headers etc in 30-40 milli-seconds - almost zero if the bot accepts cookies and revisits. The whole is controlled by an ASP script (MS IIS) included in every page.

My database currently has around 33,600 records, some on the order /10, many at /15 or /16 and others single IPs, the latter being automatically blocked by UA, header or other misdemeanour. Many of the /19 and larger are there as "broadband range place markers", merely to identify singles when they become blocked. Yet others are server farms or badly-behaved regions.

keyplyr




msg:4460650
 1:07 am on Jun 2, 2012 (gmt 0)

Problem is, the cleanest and simplest way to block is by CIDR numbers. That's core, so it comes after everything else. Does it make any difference if you put your most-likely-to-be-blocked ranges first, or does the server still have to read the whole rest of the list even after it has met a "Deny from" directive?

I block by CIDR (about 500 ranges) with mod_authz_host almost at the very top. I use numerical sequence for easy management.

Then I use mod_rewrite to block a couple dozen UAs, then UA-to-IP filtering, then redirects, then error doc redirects.

grandma genie




msg:4461804
 9:51 pm on Jun 5, 2012 (gmt 0)

I've had the same incidents even with my tiny 1 mg logs. This stood out like a sore thumb.

94.140.235.2 - - "GET /home HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
74.115.1.91 - - "GET /home HTTP/1.1" 301 249 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
178.45.9.54 - - "GET /home/ HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
115.241.34.16 - - "GET /home/ HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
93.124.57.110 - - "GET /home/ HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"
76.105.245.96 - - "GET /home/ HTTP/1.1" 200 26488 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"

These were all single hits, one right after the other. Interesting thing about the ua was the SLCC2. I checked the logs again for just that segment and noticed all the visitors coming in with it were from forums or blogs. Thought that was kinda odd. Most of them were the result of a hotlinked image, though I've blocked that type of activity in htaccess.

Here's one:

78.93.53.nnn - - "GET /image.jpg HTTP/1.1" 403 - "www.qahtaan.com/vb/showthread.php?t=nnn" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; .NET4.0E)"

Is that just a coincidence?

wilderness




msg:4461829
 10:52 pm on Jun 5, 2012 (gmt 0)

gg,
Nobody has any rhyme or reason determined, rather we're just reaching and hoping perhaps another will see something we are not seeing.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved