homepage Welcome to WebmasterWorld Guest from 54.204.64.152
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
ip ban not working
403 in http-viewer, 200 in logs
joergnw10




msg:3560702
 11:40 am on Jan 29, 2008 (gmt 0)

I have the following in my htaccess:
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{HTTP_USER_AGENT} SomeBot [OR]
RewriteCond %{HTTP_USER_AGENT} 000.000.000.
RewriteRule .* - [F]

When using an HTTP viewer it returns a 403 for all combinations of the above IP, ie for 000.000.000.01
BUT when I look at my logs these IP's get a 200 header and still crawl my site. What can I do now?

 

Brett_Tabke




msg:3560840
 1:29 pm on Jan 29, 2008 (gmt 0)

must be something slightly different about the agent name? Try using a bit more to the agent name (if there is any more text). Try matching the beginning of the name like so:

RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^wells [OR]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteRule ^.* - [F]

joergnw10




msg:3560871
 1:52 pm on Jan 29, 2008 (gmt 0)

Thanks for the quick reply.
The bot I here named 'SomeBot' is being blocked.
The IP addresses I am also trying to block all begin with the same 6 digits (ie 000.000.000), only the two last digits vary. So I used
'RewriteCond %{HTTP_USER_AGENT} 000.000.000. '
in htaccess.
Should I use the name of the user agent instead? The log has Gecko/somenumber at the end of each line. I'll try using this instead (presuming this is the name :-)

jdMorgan




msg:3560888
 2:10 pm on Jan 29, 2008 (gmt 0)

IP addresses are not %{HTTP_USER_AGENT}s, they are {REMOTE_ADDR}s. :)

[added]
Also, you must escape the literal periods in your patterns to avoid them being treated as "any single character" regex tokens, and start-anchor the patterns. Not doing this would lead to possibly-dangerous ambiguities, should there be any single-digit octets in the actual address(es).

RewriteCond %{REMOTE_ADDR} ^1\.2\.3\.

For example, without anchoring and escaping the periods, the pattern shown here would match 102.3*.***.***, 112.3*.***.***, ***.102.3*.***, ***.112.3*.***, ***.***.102.3*, ***.***.112.3*, and possibly others (!)
[/added]

Jim

[edited by: jdMorgan at 2:20 pm (utc) on Jan. 29, 2008]

joergnw10




msg:3562739
 12:03 pm on Jan 31, 2008 (gmt 0)

Thanks Jim, I guess I messed that one up.
The ban now seems to work ok with the agent name.

For the future: not sure I understand correctly how to ban an IP range. would it work like this?

to block: aaa.bbb.ccc.0 to aaa.bbb.ccc.999

RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule - [F]

Thanks again

wilderness




msg:3562771
 12:54 pm on Jan 31, 2008 (gmt 0)

For the future: not sure I understand correctly how to ban an IP range. would it work like this?

to block: aaa.bbb.ccc.0 to aaa.bbb.ccc.999

RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule - [F]

Thought I had an old Webmaster World bookmark which provided explantion of IP ranges and their use in the various Class groups.

Others may explain this better, however here goes.
The Class groups are not numerics, rather charcters.
The range of characters are 0-255

Using the example that you've provided would deny access (note exception below) to the 0-255 range (in Class D) or what ever Class C that you've specifed as "ccc".

Potentially, each Class (whether A, B, C or D) offers the following ranges:
[0-9]¦[1-9][0-9]¦1[0-9][0-9]¦2[0-5][0-9]

(Please note the forum breaks these pipe characters and they requiring editing before use).

broken down as follows:
0-9, 10-99, 100-199, 200-259
With the 250-259 expression being used to save expression space, even though it defines non-existent ranges (256-259).

It would be redundant to specify the entire 0-255 range a Class D expression, however it would still work.

You may break these numbers down into smaller groups for broader expressions.
EX:
aaa.bbb.ccc.[0-9] which would deny the 0-9 Class D range of the Class C that "ccc" represents.

Another example is aaa.(bbb¦bab)\.ccc.[0-9]
which would provide an OR for two different Class B ranges represented by "bbb or bab" with the Class C range being the same in both instance, while the Class D range of 0-9 is also the same in both instances.

(exception)
The folllowing is what I use, the logic has long since passed from my recollection.

RewriteRule .* - [F]

joergnw10




msg:3562856
 2:26 pm on Jan 31, 2008 (gmt 0)

wow, thanks a lot for explaining it in detail. Bookmarked this one for the future!

joergnw10




msg:3565176
 10:15 am on Feb 3, 2008 (gmt 0)

Right, I had another go, but still without success.
Still the named bot is blocked, but the ip range is not. I tried it with only the block for the ip range and also only blocking a single ip, but it always returns a 200.
Here is what I currently have in htaccess regarding this, after having made the changes according to your replies:

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\. [OR]
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

Did I overlook something? Or are there other reasons why this might not work?

You can see in the following thread what it is I am trying to block, in case it is relevant: [webmasterworld.com...]
(As this might be a search engine spider, I probably won't leave the block in my htaccess, but would really like to know what I am doing wrong here).

wilderness




msg:3565222
 2:20 pm on Feb 3, 2008 (gmt 0)

Still the named bot is blocked, but the ip range is not. I tried it with only the block for the ip range and also only blocking a single ip, but it always returns a 200.
Here is what I currently have in htaccess regarding this, after having made the changes according to your replies:

Are the 200's a result of redirects to either 403 or 404 pages?
Check your visitor logs and compare files sizes for the actual requested pages to the 200's of these denied ranges.

In addition, you do not provide if these NEW Rewrites are the solitray lines in your htaccess?
As a result, I ponder if you've opened the lines with:

RewriteEngine on


RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\. [OR]
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

joergnw10,
The lines contained in my htaccess are rather simple compared to many of the inquiries and answers that I see here.
As a result, there are corners of Apache and regex that simply fail to gather my attention or even understanding.

Another will need to advise of the effect of the following line:

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$

In addition, I'm not all that sure that it's possible to combine UA's and IP ranges and achieve denail in the manner you have?

Try the following and see if two rewrites solve your issue:

#Rewrite 1
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule ^.* - [F]

#Rewrite 2
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

[edited by: wilderness at 2:40 pm (utc) on Feb. 3, 2008]

wilderness




msg:3565229
 2:35 pm on Feb 3, 2008 (gmt 0)

You can see in the following thread what it is I am trying to block, in case it is relevant: [webmasterworld.com...]
(As this might be a search engine spider, I probably won't leave the block in my htaccess, but would really like to know what I am doing wrong here).

The person who initiated the example in this thread, never actually provided the 208 range, anther did!
Thus how would anybody realize the accuracy?

However to deny the range that was provided in the answer?

RewriteCond %{REMOTE_ADDR} ^207\.179\.(12[89]¦1[3-8][0-9]¦19[01])\.
RewriteRule .* - [F]

Correct broken pipe characters.

As an aside, that a webmaster would place emphasis on the primary contents of google analytics reports, which are created to cover a group of the masses (much the same as the stats reports that are used by host providers), without determining the location of and learning how to interpret the content of their website visitors logs?
Is an asburd idea, from, my point of view.

joergnw10




msg:3565329
 6:29 pm on Feb 3, 2008 (gmt 0)

Thanks for your replies, wilderness! I will have another go tomorrow to see if I can get it to work.
Not sure where the 207 range now comes from. My stats show those from the 208 range mentioned in the other thread. As you might have guessed I'm not exactly an expert on htaccess, blocking ip's and the like :-) Kind of freaked out when my stats got inflated by all these visits by something that to my unknowing eyes seemed to pretend to be a real visitor (also appeared as such in 'statcounter').

wilderness




msg:3565330
 6:34 pm on Feb 3, 2008 (gmt 0)

"not sure where the 207 range"

My typo.
Sorry.

Two typos.
This should actually read:

RewriteCond %{REMOTE_ADDR} ^208\.111\.(12[89]¦1[3-8][0-9]¦19[01])\.
RewriteRule .* - [F]

joergnw10




msg:3567499
 9:34 am on Feb 6, 2008 (gmt 0)

Thanks again wilderness!
The block of the ip range is now working. For some reason the http-viewer I'm using is still retutrning a 200, but the logs now show 403's.
It actually works with the combination of user agents and ip addresses.
The following line I received from the help in another thread - it names exceptions that the blocked agents are allowed to get - ie they should go away after beeing allowed to get robots.txt without then trying to get other files (if they obey robots.txt of course).

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$

jdMorgan




msg:3568060
 10:36 pm on Feb 6, 2008 (gmt 0)

If your "http viewer" is giving inaccurate results, try the "Live HTTP Headers" add-on for Firefox/Mozilla browsers. It's a free and indispensable tool for troubleshooting redirects, rewrites, caching, keep-alive, MIME-type and all other problems related to HTTP request and response headers.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved