homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 80 message thread spans 3 pages: < < 80 ( 1 [2] 3 > >     
welcome to the web?
Dan99




msg:4681886
 5:17 pm on Jun 22, 2014 (gmt 0)

OK, this is a "welcome to the web" question, I know.

I run a small scale server that serves about a thousand document copies per week to my colleagues around the world. Apache 2.2 on Mac OS 10.8.

In my logs, I see a "malicious user" who, a few weeks ago, started downloading the
    same
20 MB file, four times in a row, every 20-40 minutes. The requesting IP is always different. Sometimes it's an IP that I've already denied service to, being on a standard China blacklist. But sometimes it's on an IP that has no registered complaints. So how do I know it's the same malicious user? Because the request is ALWAYS for the same file, and ALWAYS four times. That is his or her hacker "signature".

So, OK, I just changed the filename slightly. My regular users will figure that out. But the requests keep coming with the old name. So instead of a 200 code, and a lot of megabytes, they're now getting a 404, and a few tens of bytes. Bandwidth-wise, there is no problem anymore.

But the requests are kind of littering my log. Any suggestions for mitigation? Is this a case where someone has infected machines around the world, and has commanded them to bang on me? Is he/she likely to get bored and go away? If the goal is to use bandwidth, they don't seem to be paying any attention to it. It's not anymore. Is there any way to notify the managers of these various IPs that their machine is being pirated?

I can handle malicious users, by banning their IP. No sweat. But this guy/gal is using LOADS of IPs to do the job. No way I can ban them all. I've been webserving for years, but this is the first time I've seen this.

 

Dan99




msg:4682133
 10:48 pm on Jun 23, 2014 (gmt 0)

Yes, I completely agree that a redirect to something uninformative, like the home page, would drive me bonkers as well. As in, you're getting shunted here, but we're not going to tell you why!

And yes, the custom 404 page is exactly what I have. It says politely that there is no file here by that name, but points to an index page where one can go find the right name. It doesn't go to that page. It just links to it.

Dan99




msg:4682135
 11:02 pm on Jun 23, 2014 (gmt 0)

OK, here are my results for the user agents from these attacks. I had two "attacks" in four hours after I started logging these user agents. That's a bit odd in itself, as they had been coming in semi-regularly every 20-40 minutes before that. Does the attacker *know* that you're logging these? Curious that the timing changed right around when I started logging.

Both attacks were from blacklisted Chinese IPs (218.204.154.212 and 123.7.57.169), so they just got 403'd. But the user agents for both were

"Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)"

I need to see if they're all like that, henceforth. I gather this is a fairly archaic user agent, no? Doing some web searching, I am led to believe that this is a popular user agent for annoying bots. Of course, as noted, these can be fabricated, so it may not mean much.

Now, I can't imagine that any of my colleagues are using such a primitive system, so I would not hesitate to block such user agents entirely if I knew how.

aristotle




msg:4682141
 11:42 pm on Jun 23, 2014 (gmt 0)

You should show the entire log entries for each visit. There's other information in them, such as the referrer string, that might be useful.

Also, it might be too soon to reach any conclsuions about the UA. I see different UAs for this on my site everyday, and there have been a large number of them altogether.

Dan99




msg:4682153
 12:09 am on Jun 24, 2014 (gmt 0)

OK, here are the entire log entries for the last three "attacks". All from blacklisted IPs. Not much else here to see.

123.7.57.169 - - [23/Jun/2014:13:49:17 -0600] "GET /mysite/docs/myfolder/mydoc.pdf HTTP/1.1" 403 249 "http://mycomputer.com/mysite/docs/myfolder/" "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)"

218.204.154.212 - - [23/Jun/2014:16:37:11 -0600] "GET /mysite/docs/myfolder/mydoc.pdf HTTP/1.1" 403 249 "http://mycomputer.com/mysite/docs/myfolder/" "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)"

122.141.220.137 - - [23/Jun/2014:18:02:44 -0600] "GET /mysite/docs/myfolder/mydoc.pdf HTTP/1.1" 403 249 "http://mycomputer.com/mysite/docs/myfolder/" "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)"

Again, as I mentioned earlier, these come in quadruplicate. Four identical requests each.

Since I started recording these user agents, I haven't seen anyone else using this one.

Indeed, the delay in between them is now longer than it used to be. Peculiar. Maybe they're getting tired?

aristotle




msg:4682154
 12:23 am on Jun 24, 2014 (gmt 0)

I don't have time to analyze it right now, and there are other members here who know a lot more about this than I do, but if that referer stays the same, you might be able to find a simple solution to the whole thing by blocking it. I mention this only as a possibility, because it still needs to be analyzed

wilderness




msg:4682155
 12:30 am on Jun 24, 2014 (gmt 0)

Now, I can't imagine that any of my colleagues are using such a primitive system, so I would not hesitate to block such user agents entirely if I knew how.



RewriteCond %{HTTP_USER_AGENT} Windows\ NT\ 5\.0\)$
RewriteRule .* - [F]

Dan99




msg:4682158
 12:47 am on Jun 24, 2014 (gmt 0)

Thank you. I'll keep my eye on these, and if it seems the right thing to do, I'll impose that block.

Um, what "referer" are we talking about? As far as I can see, my document folder is referring to my document.

wilderness




msg:4682171
 1:38 am on Jun 24, 2014 (gmt 0)

Um, what "referer" are we talking about?


In this log line!

122.141.220.137 - - [23/Jun/2014:18:02:44 -0600] "GET /mysite/docs/myfolder/mydoc.pdf HTTP/1.1" 403 249 "http://mycomputer.com/mysite/docs/myfolder/" "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)"

The "refer" is "http://mycomputer.com/mysite/docs/myfolder/"

Dan99




msg:4682178
 1:56 am on Jun 24, 2014 (gmt 0)

That's what I meant. Aristotle was saying that if the referer stays the same, I can block it. But I'm not going to block my own document folder. Anyone can use that. From a defensive perspective, an external referer would be more useful.

lucy24




msg:4682206
 4:34 am on Jun 24, 2014 (gmt 0)

Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)

That looks awfully familiar. I think it's one of the UA strings that my host blocks with mod_security, so it comes through in logs as 503 instead of 403.

Everyone has an Oldest Allowable Browser cutoff. Mine's very conservative-- and there's a further range where requests get redirected to a custom "old browsers" page with the usual blahblah about je suis désolée but the server thinks you are a robot. Human users don't need to have their faces rubbed in the fact that the server thinks only what I tell it to think ;)

an external referer would be more useful

Robots and referers are a whole new issue. Most robots don't send one at all. Some send a generic fake referer such as your site's front page. (This allows me to block a fair number of requests.) Some use each request-- whether admitted or not-- as the referer for the next request. Some are purely referer spam. Can't imagine how this would work on anyone who's been reading logs for more than a few months, but they keep trying.*

But, uhm, how does your own personal computer get to be anyone's referer? That is, other than maybe yourself when testing.

Maybe they're getting tired?

Most robots actually do get tired and go away after a while.


* I have some weird links, but I'm morally certain there is nothing on my site that would appeal to an x-rated site in Kazakhstan.

not2easy




msg:4682214
 4:57 am on Jun 24, 2014 (gmt 0)

I see MSIE 6.0 frequently on visits from China. That IP is for China Unicom.

wilderness




msg:4682319
 11:19 am on Jun 24, 2014 (gmt 0)

That's what I meant. Aristotle was saying that if the referer stays the same, I can block it. But I'm not going to block my own document folder. Anyone can use that. From a defensive perspective, an external referer would be more useful.


lucy has a thread either in this forum or the SSID Forum, which explains how to deny this false refers, and when they do NOT come from your own website (s).

Dan99




msg:4682332
 11:37 am on Jun 24, 2014 (gmt 0)


But, uhm, how does your own personal computer get to be anyone's referer? That is, other than maybe yourself when testing.


Pretty simple. I have a document index that points to folders. People go to a folder and, once there, select a document to download. That document download then shows my folder as the referer. Not sure how else I would do it.

Dan99




msg:4682334
 11:39 am on Jun 24, 2014 (gmt 0)

Everyone has an Oldest Allowable Browser cutoff.


That's pretty interesting, but I don't have one. How do you implement that?

aristotle




msg:4682335
 11:41 am on Jun 24, 2014 (gmt 0)

I'm not sure about the significance of all these Chinese IPs you're seeing. In the activity against my site, nearly all of the IPs are for U.S. locations. It must depend on which sites are distributing the malware.

Anyway, for your next step, you should start comparing the log entries for these bot visits with log entries for visits by real humans, and look for consistent differences.

wilderness




msg:4682339
 11:49 am on Jun 24, 2014 (gmt 0)

Everyone has an Oldest Allowable Browser cutoff.


That's pretty interesting, but I don't have one. How do you implement that?


There are examples in the SSID Forum.

aristotle




msg:4682363
 1:17 pm on Jun 24, 2014 (gmt 0)

According to wikipedia, here's what botnets can be used for:
Botnets are exploited for various purposes, including denial-of-service attacks, creation or misuse of SMTP mail relays for spam (see Spambot), click fraud, mining bitcoins, spamdexing, and the theft of application serial numbers, login IDs, and financial information such as credit card numbers.

It appears to me that out of this list, the only one that involves numerous repeated requests for the same file(s) from a particular website would be a DDoS attack against that site. That's why I think the requests that some of us are seeing now are tests being done as part of a preparatory stage for an eventual all-out attack.

lucy24




msg:4682434
 7:01 pm on Jun 24, 2014 (gmt 0)

How do you implement that?

That's partly up to your personal coding style. I use a two-pronged approach. Simple lockouts use mod_setenvif in conjunction with mod_authzzz (exact name depends on Apache version):

BrowserMatch icky-UA-string-here keep_out
BrowserMatch nasty-UA-string-here keep_out
BrowserMatch foul-UA-string-here keep_out

using Regular Expressions as appropriate, and then you go on to

Deny from env=keep_out

Deny from 1.2.3.4
Deny from 11.22.33

et cetera listing all blocked IP ranges in whatever order is convenient for you. Mine are currently grouped into ARIN, RIPE etc, and then numerically within each group, with a separate block for China because there are so ### many of them.

That's where you put in things like "MSIE [1-4]\." or "Opera [3-9]". (See that . period in MSIE? Hasty addition after MSIE 10 was introduced and I accidentally blocked an early adopter.)

Then the second prong is the more complicated lockouts. These generally go to mod_rewrite-- generally a last resort because it's more server-intensive than a simple IP block. These are the if/then sets, like "if it claims to be Googlebot but isn't from 66.249.blahblah" or "if it's got a .ru/.ua referer other than Google or Yandex" and so on. A very, very simple referer block, like semalt-- very vexatious in recent weeks-- can go in mod_setenvif.

a DDoS attack against that site

In the case of tiny sites like mine, possibly a DDoS attack on some other site using the same or neighboring server. (If you've choked 11.22.33.44 then you've probably blocked the rest of 11.22.33 as well because at the end it's the same connection.)

But I do wonder if there are standard new-robot scripts: "Here's a set of requests to try while testing whether your botnet is working properly." The one that particularly strikes me is how quickly the "contact" botnet showed up after I created the contact.html page. That's a detail hadn't even realized until I looked up some details earlier in, I think, this very thread.

Dan99




msg:4682436
 7:28 pm on Jun 24, 2014 (gmt 0)

Nice. Thanks.

I presume this means, in your example, that the keep_out expression ends up corresponding to all cases matched, as in, the icky, nasty, and foul browser strings?

Now, I'm a little confused about mod_setenvif or mod_authzzz. Don't these commands just go in .htaccess? Now, mod_setenvif is a module that I've already loaded.

aristotle




msg:4682455
 8:59 pm on Jun 24, 2014 (gmt 0)

Lucy wrote:
In the case of tiny sites like mine, possibly a DDoS attack on some other site using the same or neighboring server.

That's a good point Lucy. But in your case it's kind of amusing to think about, because whoever picked your site for this must have had no idea that they were going up against an expert like you.

lucy24




msg:4682461
 9:12 pm on Jun 24, 2014 (gmt 0)

keep_out is my name for the environmental variable-- set by default to "1" if you don't specify a value. Other people use other names like "bad_bot". Since mod_setenvif executes before mod_auththingummy, the environmental variable has been set by the time the request reaches the final phase of access control. So it's saying "If this environmental variable exists, deny the request".

Don't these commands just go in .htaccess?

If it's your own server, you'll probably want to put everything in the config file. But even when you do have control over the config file, htaccess can be useful temporarily. If you're introducing a whole bunch of new rules, you can set AllowOverrides in certain directories, and run your server with htaccess enabled for a while to make sure everything works as intended. Saves having to restart the server every time you move a semicolon.

Some unmodified examples:

BrowserMatch ^-?$ keep_out
BrowserMatch Ahrefs keep_out
BrowserMatchNoCase "America Online Browser" keep_out
BrowserMatch Apache-HttpClient keep_out
...
BrowserMatch Firefox/[12]\b keep_out

et cetera. Mine go in alphabetical order. The very first rule is for "if the UA string is blank or missing". All legitimate visitors send a UA. (Even google's faviconbot, which used to be blank, now sends Firefox 6. This is not a huge improvement, obviously ;)) In mod_setenvif, you can put material in quotation marks as an alternative to escaping literal spaces.

Then in mod_rewrite you get things like

RewriteCond %{HTTP_USER_AGENT} MSIE\ [56]\.\d [OR]
RewriteCond %{HTTP_USER_AGENT} Chrome/[1-8]\.\d [OR]
RewriteCond %{HTTP_USER_AGENT} Firefox/(3\.[0-5]|[567])
RewriteCond %{HTTP_COOKIE} !oldbrowser
RewriteCond %{HTTP_REFERER} !\?
RewriteCond %{REMOTE_ADDR} !^11\.22\.
RewriteRule (^|\.html|/)$ http://example.com/boilerplate/goaway.html [R=301,L]

This is obviously site-specific. The line "11.22" is camouflage for a specific governmental IP that used to use very old browsers, though I probably don't need this particular exemption any more. (I think they've upgraded their MSIE 6 machines to MSIE 8.) And the ? in the referer means "If they're from a search engine, give them the benefit of the doubt". And the "oldbrowser" cookie is exactly what it looks like: I've poked a hole for you in the past, so you're good to go.

Dan99




msg:4682467
 9:25 pm on Jun 24, 2014 (gmt 0)

So in my particular case, I guess I'm looking at

BrowserMatch "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.0)" keep_out
Deny from env=keep_out


That will 403 anything from this bot. Right?

Wow. This is pretty awesome.

lucy24




msg:4682494
 10:54 pm on Jun 24, 2014 (gmt 0)

Except that mod_setenvif uses Regular Expressions, so it has to be
"^Mozilla/5\.0 \(compatible; MSIE 6\.0; Windows NT 5\.0\)$"
The quotation marks only protect the spaces. Escaping \. is good practice though not essential here (sometimes it is essential); escaping parentheses is always essential in a RegEx. Opening and closing anchors because there may be quasi-legitimate human UAs that contain, but are not limited to, the offending string.

:: wait, stop, rewind ::

Mozilla/5 with MSIE 6? Can't think how I missed that the first time. MSIE didn't start using Mozilla/5 until

:: shuffling papers ::

MSIE 9. So the combination of "Mozilla/5" with MSIE < 9 already flags the UA as bogus.

Dan99




msg:4682501
 11:03 pm on Jun 24, 2014 (gmt 0)

So the combination of "Mozilla/5" with MSIE < 9 already flags the UA as bogus.

Heh. Interesting. But I can still use that script to 403 this thing until, I guess, it decides to morph.

Dan99




msg:4682564
 3:12 am on Jun 25, 2014 (gmt 0)

Probably worth pointing out about these bot requests that before I implemented the BrowserMatch denial of service, which 403's all of them (thanks, Lucy), about half I was politely 404-redirecting, and the other half were on Chinese IP blacklists that I had already blocked. What I had not appreciated was that ALL of these IPs that are being used here are Chinese.

Dan99




msg:4682949
 2:08 am on Jun 26, 2014 (gmt 0)

Well, FWIW, it seems this bot is nothing new.

[webhostingtalk.com...]

*Precisely* the same thing I'm seeing. This original complaint from January 2010, with an informative update two months ago.

Note the Wikipedia article about Xunlei. It says that this is its default dummy user agent, but once it realizes it's being blocked, it switches user agents. I have not seen that yet, and am keeping my fingers crossed.

If you Google "block Xunlei", you get loads of hits.

slipkid




msg:4682976
 5:52 am on Jun 26, 2014 (gmt 0)

I believe you are going to find yourself going nuts trying to block all the bad traffic coming from China using UA or targeted IP blocks.

I tried this for a short time and finally decided to block the whole damn country using class A IP blocks. It litters my logs with 403s but I feel better knowing the bad guys can not get to my site.

wilderness




msg:4683005
 11:19 am on Jun 26, 2014 (gmt 0)

If you've your own server, your able to change your log output to hide the 403's

Dan99




msg:4683011
 12:50 pm on Jun 26, 2014 (gmt 0)

I agree completely. I'll probably have to block the whole country. And yes, it's the log littering that is the inconvenient part of this. Of course, I could just filter my logs.

Now, wilderness, I do run my own server. How do I change the log output to hide the 403s? Or better yet, put them in a separate file?

aristotle




msg:4683013
 1:16 pm on Jun 26, 2014 (gmt 0)

Dan99 wrote
Well, FWIW, it seems this bot is nothing new.
[webhostingtalk.com...]

From what it says at that link, this doesn't look like a botnet to me, but rather some download software that's popular in China. Although I didn't see an explanation for why it makes 4 successive requests.

aristotle




msg:4683015
 1:45 pm on Jun 26, 2014 (gmt 0)

slipkid wrote:
I tried this for a short time and finally decided to block the whole damn country using class A IP blocks.

I just did a search for a Chinese IP block list and it's HUGE. Is there some way to do this without adding so much new code to your .htaccess file?

This 80 message thread spans 3 pages: < < 80 ( 1 [2] 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved