Forum Moderators: phranque

Message Too Old, No Replies

Ban a tld with .htacess

If martinibuster can ban India...

         

jimbeetle

9:18 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A couple of days ago martinibuster banned India by IP.

Because of a problem child ( [webmasterworld.com...] ) I'd like to ban all of Vanuatu.(They only have a few thousand telephones and one ISP so it's not quite on the same scale as banning a billion or so folks from the sub-continent. )

Is there a way to do this in .htaccess by tld (.vu)?

Jim

amznVibe

9:52 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Am I missing something here, because bans like this only stop the innocent novices? The ones doing evil will just use a proxy off any server anywhere in the world they can get access to, which will bypass any block.

jdMorgan

10:18 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



True, but to answer his question, yes it's possible, but only if your server has relatively "bullet-proof" reverse DNS. If reverse DNS is slow or unreliable on your host, then you'll need to block by IP. To block by requestor domain, you can use the environment variable %{REMOTE_HOST} in mod_rewrite.

I have stopped blanket-blocking after installing a derivative of Key_Master's bad_bot PERL script (cf). It catches most of the troublemakers, so I don't have to hover over my logs any more just to keep my bandwidth overages under control. :)

Jim

Key_Master

10:38 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's such a small island, the following code is all that's needed to block visitors from Vanuatu. Insert it in your .htaccess file.

<Files ~ "^.*$">
order allow,deny
allow from all
deny from 202.80.32/20
</Files>

amznVibe,

People block visitors for various reasons all their own. It must have an effect considering the amount of attention given to big sites that engage in this practice (e.g., China 'vs Google, Google 'vs Comcast, etc.)

nativenewyorker

12:21 am on Feb 8, 2003 (gmt 0)

10+ Year Member



Key_Master,

You appear to be the resident expert here on banning specific users. I have noticed a malicious spider for several months that crawls through my sites and ignores my robots.txt file. It does not even identify itself as a spider and uses an U/A of "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)". I have set up .htaccess in an attempt to block them, but they still appear in my logs.

I have been writing .htaccess using Notepad and uploading it as htaccess.txt with WS_FTP to the root directory in ASCII mode. After it is uploaded, the file is renamed .htaccess and it disappears from view. I know that the redirect commands in my .htaccess file work, so that rules out a problem with the file being uploaded correctly to the server.

I am trying to block the following IP range of: 63.148.99.224 - 63.148.99.255

My .htaccess includes the following:
<limit GET POST>
Order Deny,Allow
Deny from 63.148.99.
Allow from all
</limit>

I have had the deny,allow command at the bottom of the page after the redirects. Does this make a difference in whether the ban works or not? What is the difference using allow,deny and deny,allow? I've seen both used, but figure bad users should be blocked first before giving access to all.

What does the first line

<Files ~ "^.*$">
in your command mean?

Thanks in advance and have a great day,
Ted

jdMorgan

12:39 am on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ted,

Read about "Order" in the Apache mod_access documentation [httpd.apache.org]. The order can be critical, and getting it wrong catastrophic.

For quick info on regular expressions - the ^.*$ stuff, try this quick tutorial [etext.lib.virginia.edu].

Jim

Key_Master

12:57 am on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jdMorgan pointed you to some good resources so I'll try to fill in the blanks. The bot that's troubling you is from Cyveillance.com . You can replace the code you have with the following.

When Cyveillance (or any banned IP) hits your site it will get a 403 page and the hit will still show up in the logs.

SetEnvIf Remote_Addr ^63\.148\.99\.(22[4-9]¦2[3-5][0-9])$ ban

<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>

<Files ~ "^.*$"> is a blanket wildcard match covering any file name. In the example above, any file the banned IP requests will be denied.

<Files ~ "^a.*$"> would cover any file name that starts with the letter a.

<Files ~ "^a.*\.htm$"> would cover any file name that starts with the letter a and has a .htm extension, e.g. apple.htm

jimbeetle

1:57 am on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Key_Master,

Did a bit more digging and the site and other .vu troublemakers from the same person aren't based on the island but at different IPs so am going to ban those handful that I've found.

And thanks also for the word on Cyveillance. Have been having same trouble as nativenewyorker. Your explanatory notes top notch, I almost think I understand them.

Thanks again,

Jim

nativenewyorker

2:42 am on Feb 8, 2003 (gmt 0)

10+ Year Member



They have a phony claim on their website as to how they observe the robots.txt exclusion standard. However, there are numerous sites out there that say that they have banned them for not observing robots.txt.

The visit I had from them today did not even request a robots.txt file. They just ripped through my site eating away at my bandwidth while offering back nothing in return.

Thanks to jdmorgan and Key_Master for helping me banish them at last.

Ted

c3oc3o

5:36 am on Feb 8, 2003 (gmt 0)

10+ Year Member



I don't believe the solution in msg #4 will help jimbeetle at all.. .vu is an open domain, the chance of the server that is causing the problems actually being on the island and therefore in the blocked IP range is virtually zero.

I think you just need to ban the one IP of the server that has cached your pages:
Thru.vu is 66.216.72.244, but secure.thru.vu currently resolves to fake 10.0.0.0 (and therefore can't be reached). While they cached/redirected to your site, it was obviously something else. You might try banning 66.216.72.240 - 66.216.72.247, the IP range of that company.
(http://ws.arin.net/cgi-bin/whois.pl?queryinput=66.216.72.244)

Key_Master

5:52 am on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[apnic.net...]

Obviously, at some point secure.thru.vu [google.com] did resolve to a real IP otherwise Googlebot would have not been able to index any of those links. The domain thru.vu appears to be having some technical dificulties at the moment though, so the 10.0.0.0 is probably just a DNS error.