Forum Moderators: phranque
So far from what I understand mod access is easier to learn that doing this by mod rewrite, so if that's the case I'd prefer that.
Does anyone know of a super basic tutorial? Or maybe a tool I can type in the ranges and it'll spit out the code? (heh, I know, not likely.. but I can dream of any easy way out of this mess ;) )
[edited by: jdMorgan at 4:08 pm (utc) on Oct. 21, 2006]
[edit reason] De-linked [/edit]
That is the purpose of this forum: To help you learn how to do this for yourself. As such, specific questions are always welcome. In many cases, you'll be referred back to the Apache documentation, so that's a good place to start. Once you've got a few specific questions based on reviewing the docs, please do post them here.
In this case, the directives you're looking for are described in the mod_access documentation [httpd.apache.org]. We also had some discussion of a combined mod_setenvif/mod_access method in this recent thread [webmasterworld.com].
Jim
Here are some other explanations for you to explore:
(The first likely the simpliest answer)
[webhelpinghand.com...]
[baremetal.com...]
[edginet.org...]
[dimi.uniud.it...]
[webhelpinghand.com...]
and if none of that provides enough depth?
You may begin where I did, before joining WebMaster World:
[google.com...]
Here's a stripped down snippet of what I'm going to use.
SetEnvIfNoCase User-Agent "Missigua" bad_bot
<Files *>
Order Deny,Allow
Deny from env=bad_bot
Deny from ###.##.##.0/19
Deny from ##.##.##.160/28
Allow from all
</Files>
That basically says "everyone is allowed by default (Order Deny,Allow), except those IP ranges and anyone called a bad_bot" right? Are there any obvious errors? (wrong order, badly written, or using cidr not as effective as other ways?)
I do have another question, is using <files *>, then allowing everything a bad idea? Would that say that anyone has rights that normally the server wouldn't allow (ie more than the usual get, post, head, etc)?
*edit*
Another question, I often see the same as above written without the <files> tags, what's the difference?
[edited by: LunaC at 4:10 pm (utc) on Oct. 22, 2006]
I changed the order to what Wilderness posted yesterday afternoon. Looking at todays logfiles the 403's based on IP are still not being blocked, bad_bots are getting 403's. (I'd had the code before written as I'd first posted, same thing.. blocked IP's geting through, bad_bots 403'd)
Here's the details:
1) The IP's that should be blocked are getting stuck in a 301 as they had before I tried blocking.
(I'm guessing they are requesting / with the www and sent to / non www. That's what first cought my attention.. huge blocks of logfile claiming to be gbot getting stuck looping the same request hundreds of times .. kind of hard to miss.. IP's are not gbot, they're from hosting companies, common behavior of scrappers.) I forgot to mention this part before, didn't seem important till now.
2) I have the banning code before any redirects in htaccess (so shouldn't they get sent a 403 before even hitting a redirect?)
3) As I've said, bad_bots are getting 403. The should-be-banned IPs are not. I've triple checked and the CIDR is exactly as found in dnsstuff, reverse CIDR/Netmask test confirms the IP range is right.
So I'm more than a bit lost, no idea what I should be looking for now.
You can also omit the <Files> container if you like.
Jim
I tested using Order Allow,Deny (after reading why a few times it finally sunk in as to why that might make more sense) and banned my own IP.. finally a 403.
Thank you so much both of you for your help. I still haven't found a beginner tutorial explaining how to write regular expressions, but this way is working, so I'm OK for now. :)