| This 80 message thread spans 3 pages: < < 80 ( 1 2  ) || |
|welcome to the web?|
OK, this is a "welcome to the web" question, I know.
I run a small scale server that serves about a thousand document copies per week to my colleagues around the world. Apache 2.2 on Mac OS 10.8.
In my logs, I see a "malicious user" who, a few weeks ago, started downloading the 20 MB file, four times in a row, every 20-40 minutes. The requesting IP is always different. Sometimes it's an IP that I've already denied service to, being on a standard China blacklist. But sometimes it's on an IP that has no registered complaints. So how do I know it's the same malicious user? Because the request is ALWAYS for the same file, and ALWAYS four times. That is his or her hacker "signature".
So, OK, I just changed the filename slightly. My regular users will figure that out. But the requests keep coming with the old name. So instead of a 200 code, and a lot of megabytes, they're now getting a 404, and a few tens of bytes. Bandwidth-wise, there is no problem anymore.
But the requests are kind of littering my log. Any suggestions for mitigation? Is this a case where someone has infected machines around the world, and has commanded them to bang on me? Is he/she likely to get bored and go away? If the goal is to use bandwidth, they don't seem to be paying any attention to it. It's not anymore. Is there any way to notify the managers of these various IPs that their machine is being pirated?
I can handle malicious users, by banning their IP. No sweat. But this guy/gal is using LOADS of IPs to do the job. No way I can ban them all. I've been webserving for years, but this is the first time I've seen this.
|slipkid wrote: |
I tried this for a short time and finally decided to block the whole damn country using class A IP blocks.
I just did a search for a Chinese IP block list and it's HUGE. Is there some way to do this without adding so much new code to your .htaccess file?
This thread may be of some help.
Require a login to access...
|This thread may be of some help. |
Thanks for the link to the other WebmasterWorld thread. But with insiders and experts talking back and forth like that, I wasn't able to understand most of it. So I'm going to start searching around on the web again to see if I can find anything.
|Require a login to access... |
Yes, I could do that, but I'm having no trouble blocking these requests. Also, I consider my archive to be a public archive, so I really don't want to apply unnecessary security. In fact, the main problem now is just the raft of 403s that land in my logs. Not bandwidth hogging. It's littering. OK, not gallon bottles and paint cans, but just cigarette butts.
|If you've your own server, you[r'e] able to change your log output to hide the 403's |
I would never omit something entirely; I want to know who tried. Or at least have the option of knowing, if I need to check later. My personal log-wrangling routine pulls out all 3xx and 4xx responses. Then when those get processed, it ignores any 301 followed by a 403 for the same URL. (These aren't supposed to happen-- [F] comes before www --but it's insurance.)
A firewall is basically saying "There is absolutely zero possibility that this request was legitimate, so I don't even need to know about it."
|I would never omit something entirely; I want to know who tried. Or at least have the option of knowing, if I need to check later. |
Yes, I agree. That's why I'd rather have the 403's get put in another file. Well, I guess my 403s get thrown in my error file, but it appears that the UAs aren't being logged there along with it. Do I have to specify separately that the error file logs UAs? The Apache error file is not, however, a particularly compactly built file. Instead of "403" it logs "client denied by server configuration:".
|Instead of "403" it logs "client denied by server configuration:". |
Yes, that seems to be all it can ever do. "I understand they were blocked. I want to know WHY they were blocked, dammit!"
If it's an ordinary IP-based lockout, or something involving mod_setenvif-plus-mod_authwhatsit, you can generally figure it out. The more complicated rules can be a bother. Now, you do have one more option on your own server, though it may involve a bit of a performance hit. You can also run a separate RewriteLog with a wide range of logging levels. Then, at least, it will show you what pattern was matched. This is useful if you have a lot of complicated lockouts that are issued by mod_rewrite.
The downside that no one mod_rewrite logging level shows exactly what you want. In order to include all the stuff you want or need to see, you also have to include things you have no interest in.
There's also the option of a third-party add-on such as mod_security. This will show up with greater detail in regular error logs, because it shows what pattern was matched-- just like mod_rewrite, only more so.
Take a closer look at the Error Log settings. You can probably do more.
But there's no way around it: any time you ask for more detail in logs of any kind, it means your server has to do more work.
:: detour to Apache docs to see what's available ::
|Thanks for the link to the other WebmasterWorld thread. But with insiders and experts talking back and forth like that, I wasn't able to understand most of it. So I'm going to start searching around on the web again to see if I can find anything. |
I had the same problem and had to keep searching until my fingers shook. I believe I have something that works well.
For the record, there's someone on the forum named Lucy something who make my head swim.
It's all a scam, slipkid. Before I started reading these forums, I'd spent several years doing ebooks. This involves heavy work with a text editor, including Regular Expressions. (If it's a choice between using a tool that requires me to go into Terminal, and rolling my own RegEx, I'll pick roll-your-own every time.) It was a bit staggering to find that, even though I didn't and still don't speak a word of Apache, I could answer about 90% of the questions asked in the Apache subforum. They all come down to formulating a RegEx that will work in mod_rewrite. You do have to be more exact, though. In a text editor you can just sit back and twiddle your thumbs while .*blahblah.*blahblah.* does its thing. Most of the time, at least. If it's too vague, SubEthaEdit gets offended and goes into Perpetual Motion mode. On a server, the nanoseconds add up.
Did you ever try setting
? Error logs are always saying "Use LogLevel debug for more information". This is fantastically helpful if you're on shared hosting and you can't set the log level. My host's default logging level does include the referer. In my case that's useful, as the refer is often recognizable as the thing that got them locked out. There's your tld's in .ru, and filenames in .php, and...
Blocking one IP on the web from scraping is like trying to kill one man to stop an invading army.
If they really want your data, they can use TOR proxies or fast flux IPs, or worse.
To truly block the problem is a HUGE job, way beyond the limited scope of this thread.
Not trying to stop you from trying, just letting you know that this is a fly swatter approach going after a single fly while a swarm of flies awaits outside the screen door when what you need is to tent the place and use a bug bomb to get them all.
Reminds of me a recent conversation regarding whitelisting... :)
Well, but my problem isn't with TOR proxies or fast flux IPs. So lets not make things up. Really. I can identify proxies, and I have no trouble with them.
I see individual IPs that do mischief on my site, and I deny them. Done deal. It works.
There are excellent blacklists that identify troublesome IPs. Easy to deny them. Done deal. It works.
This thread is about one particular problem that was a little hard to identify and not obvious how to avoid it. I've denied the problem-maker, thanks to the specific help I've gotten here. When the swarm gets through the screen door, I'll let you know.
So yes, THE PROBLEM may be huge, but it's not my problem yet.
I do not have this feature.
Did you ever try setting
I do get referers. Thanks for your response.
do a google on "Log Files - Apache HTTP Server Version 2.2"
FWIW, I'm on shared hosting and have been since 1999. The only reason that I'm aware of this option, is because I recall it within a thread. (good luck finding that thread).
I'll do a search and see what I can learn.
Guess my question would be, do you do business in China?
If not, drop the whole country in the black list and save yourself a lot of headache.
|There are excellent blacklists that identify troublesome IPs. |
As long as you like playing whack-a-mole because blacklists only work if the IP is already known. By the time you find it and black list it, it's too late. Blacklisting in China is a no-win situation as I did it for a couple of years and I found new ones daily.
It never stopped.
I got so mad I pulled my hair out.
I'm bald now.
I was about to recommend a great product to solve your problems but they suddenly jacked to the price up to $500.
Never mind, but do something to avoid the baldness.
FYI, we have a couple of tools online that aid in creating IP block lists [freetools.webmasterworld.com] and user agent block lists [freetools.webmasterworld.com].
[edited by: incrediBILL at 5:19 pm (utc) on Jun 27, 2014]
No, I don't have any Chinese colleagues. Now, I don't really have any headaches, in that everyone who is nasty to me I had denied service to. EXCEPT for this bot/whatever, that was hitting on me repeatedly from many different Chinese IPs. I have successfully turned it away, by denying its UA. But yes, if it tries to get past that by morphing its UA, I may just chuck all of China.
(intentionally de-linked so the fragment won't be lost). Replace "2.2" in the URL with "2.4" if appropriate. The term "LogLevel" is slightly misleading because it doesn't apply to regular access logs, just to error logs. There are eight possible levels, identified by name just to confuse you. The RewriteLog levels are numbered instead, so you only have to remember which direction they go.
You can use a free lookup to find all IPs that belong to any one country. I started with the biggest ranges-- /10 /11 /12 like that, and added more later. Ranges below /16 aren't that frequent for China, so I just add them to the lockout list as and when I meet them. If you find sliver-sized China ranges, like smaller than /24, it means you've met a server farm-- probably PegTech-- and you can just block the whole thing.
OK, as of yesterday, they're gone. They had been hitting on me for months, every twenty minutes or so, from hundreds of independent Chinese IPs. Always in quadruplicate. Four requests in a row. The last one was July 5 at 1:31pm ET.
In fact, there are no more requests coming in with that faked UA, either. So they're not just not asking for the same thing, but they aren't asking for anything.
For the last few weeks, I'd been blocking them by UA, thanks to the excellent advice here. So once that was done, they weren't hogging any bandwidth, but just littering my log.
They probably just got bored, or maybe are plotting a more serious attack. This was just a bizarre attack coming, evidently, from an organized group of hundreds of independent Chinese IPs. Thanks to all here for excellent insights.
| This 80 message thread spans 3 pages: < < 80 ( 1 2  ) |