Forum Moderators: open

Message Too Old, No Replies

Please help with .htaccess ban

163.179.0.0 to 163.179.255.255 range

         

nativenewyorker

3:48 am on Feb 27, 2003 (gmt 0)

10+ Year Member



Hello,

I have recently adopted a ban using .htaccess for spiders that crawl my websites and suck up bandwidth without returning any benefits (copyright bots). I recently added the following line into my .htaccess file to block the entire 163.179.0.0 to 163.179.255.255 range.

SetEnvIf Remote_Addr ^163\.179\.([0-9]¦[1-9][0-9]¦[1-2][0-5][0-9])\.([0-9]¦[1-9][0-9]¦[1-2][0-5][0-9])$ ban

Looking through my logs today, I see that it is allowing some requests to pass with server code 200 while returning forbidden 403 codes with other requests. The most recent entry is the latest.

163.179.186.163 returned a 200
163.179.179.30 returned a 200
163.179.191.170 returned a 200
163.179.156.78 returned a 403
163.179.181.187 returned a 200
163.179.162.2 returned a 200
163.179.157.126 returned a 403
163.179.141.62 returned a 403

In addition, my logs do not show any user-agent string. Is a lack of a user-agent the equivalent to an email without sender info? Can we assume that these users are undesirable? Is there a way to block access to visitors that do not have a user-agent?

Thanks in advance for any help,
Ted

jdMorgan

4:08 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NNY,

If you go through the digits of your pattern and read them out loud, you'll see that the ones getting through are not blocked by that pattern.

I mean, [0-9]¦[1-9][0-9] reads as: zero to 9 OR 10 to 99 OR etc.

You might just want to leave off the last two classes and go with "^163\.179\. ban" if you really want to exclude a large range like that.

You can indeed block a blank user-agent, but it's a very bad idea. Many, many user access your site through caching proxies (such as those used by corporations) and these usually strip off the user-agent. You might find yourself saying "Goodbye" to a large portion of your workday visitors!

Jim

wilderness

4:14 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ted,
This will take care of the no UA.

RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule ^.*$ - [F]

The only legitimate bot that I've seen not use a UA is Lycos. Even that is confusing because Lycos reads robots text without either a referrer or ua resulting in a 403, and then reads the main page with a UA resulting in a 200 and leaves in confusion.

Although I've mentioned that I'm in agreeemnt that use of denying no UA is NOT a good idea? I do deny so.

Jim amy come along and provide a much simpler SetEnv.
However I don't really think it's necessary for the range you desire.

deny from 163.179.

making sure to include the last period will do what you desire. Denying 163.179.0-255.0-255

pendanticist

4:15 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey Jim, is this "-" "-" what you call a blank UA?

Pendanticist.

wilderness

4:18 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Affirmative Pendanticist.
That will result in a 403 on my sites from the two lines I provided.

[edited]
I might add that is both a blank referrer(l) and ua(r).

jdMorgan

4:31 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All,

Let's not confuse the issue of human visitors to your site who are stuck behind a corporate network firewall/proxy that removes their user-agent strings with the issue of bad 'bots. Bad bots often come in with a blank UA and a blank referer, and I'd say ban 'em, EXCEPT for the legitimate visitors stuck behind those corporate proxies who are trying to buy something from you.

If you want to ban them, that's your prerogative, but I'd say it's bad business.

NNY,

To be more specific, there are "holes" in the covered IP classes.

SetEnvIf Remote_Addr ^163\.179\.([0-9]¦[1-9][0-9]¦[1-2][0-5][0-9])\.([0-9]¦[1-9][0-9]¦[1-2][0-5][0-9])$ ban

Breaking it down:

SetEnvIf Remote_Addr ^163\.179\.(

[0-9]¦[1-9][0-9]¦
0 through 9 OR 10-99

[1-2][0-5][0-9])
100-159 <hole> 200-259

\.([0-9]¦[1-9][0-9]¦

[1-2][0-5][0-9]
100-159 <hole> 200-259

)$ ban

If you really want to cover a whole 0-255 valid range, then the regex might be
[0-9]¦[1-9][0-9]¦1[0-9]{2}¦2[0-4][0-9]¦25[0-5]

However, as I said (and wilderness observed), in that case, you might as well just ignore the whole class and leave it out.

SetEnvIf Remote_Addr ^163\.179\. ban
would take care of 163.179.0.0 through 163.179.255.255 (and note well that no end anchor "$" is used).

Jim

jdMorgan

4:54 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One more addition:

I corresponded with a webmaster whose site had been raided using a particularly nasty trick:
The 'bot used a referrer of a literal - and a user-agent of a literal -
That is, the UA and referrer were not blank, they were minus signs. The raw access log looked just like it would with a blank referrer, though, and showed a 200 response code. Tricky. I banned 'em, too.
So heads up on that one.

Jim

wilderness

5:14 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim,
Some time ago I grew tired very fast of somebody who kept adding about a dozen X's to his referrer.
Then another who added about 20 Zero's.
Those are also on SetEnv.

nativenewyorker

5:20 am on Feb 27, 2003 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for clarifying where the hole was in my command. Sometimes we overlook the obvious and it is more apparent from someone else's perspective. I was digging around in the WebmasterWorld archives for more info on the SetEnvIf command instead of bombarding you with requests for more explanations.

I've noticed that you are quite vigilant in protecting your site from bad IP's. Is there any chance you could share your list of banned IP's?

Thanks again,
Ted

jdMorgan

5:53 am on Feb 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ted,

Others here are far more vigilant than I! :)

I have been trying recently to sort them out and classify them into groups such as "Permanently outta here" and "some dumb script kiddie who will soon get bored" and other such classes. What I want to do is figure out which bans I can and should expire after a given time. Otherwise, I see myself with a 250kB .htaccess file in a few years... :o

Similarly, I was thinking tonight about reviewing my banned user-agent list and comparing that with the ones that have actually hit any of my sites in the past 2-3 years. I'd imagine I could delete half of my banned UA's and not have any real problems.

The thing that would allow me to trim these lists back is the fact that I'm running a variant of the spider-trap PERL script that key_master posted here on WebmasterWorld. If one of the un-banned IPs or user-agents made a come-back, it'd run into that trap pretty quickly anyway. With that script installed, I have been able to walk away from my log files, and no longer have to hover over my sites like a protective mother all the time.

Also, I have observed that many times, bad-bot attacks seem to be market-segment-specific. Someone here will report a terrible onslaught, and I'll put that on my watch-list, but it never shows up.

For that reason and others, I'm not sure I'd be doing anyone a favor by publishing my list. I think the biggest favor I could do is to advise, beg, plead, and cajole everyone to install key_master's bad-bot script or one the variants that has been posted here. Working with robots.txt, .htaccess, and some minor changes on your pages, it detects intrusions, adds the IP address to a ban list in your .htaccess file, and thereby blocks all subsequent accesses from that IP. Various versions add logging information as well, so you can check today's additions to your .htaccess, grab the time-stamp from there, and then go look at your raw logs by timestamp if you want to see the details. Other than that, it can be completely "hands-free" - giving you time for more productive pursuits (like posting here). :)

Please try the script - do a site search for "bad bot PERL script" and read the first 3 or 4 threads. It isn't perfect, but it is a great help to keep your site safe from e-mail harvesters, mass-downloaders, snoop-bots, and the rest of that riff-raff. It may also save you several hours per week by eliminating the need to manually review logs and add IPs to your ban list.

Jim