homepage Welcome to WebmasterWorld Guest from 54.227.20.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
ip ban not working
403 in http-viewer, 200 in logs
joergnw10

5+ Year Member



 
Msg#: 3560700 posted 11:40 am on Jan 29, 2008 (gmt 0)

I have the following in my htaccess:
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{HTTP_USER_AGENT} SomeBot [OR]
RewriteCond %{HTTP_USER_AGENT} 000.000.000.
RewriteRule .* - [F]

When using an HTTP viewer it returns a 403 for all combinations of the above IP, ie for 000.000.000.01
BUT when I look at my logs these IP's get a 200 header and still crawl my site. What can I do now?

 

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3560700 posted 1:29 pm on Jan 29, 2008 (gmt 0)

must be something slightly different about the agent name? Try using a bit more to the agent name (if there is any more text). Try matching the beginning of the name like so:

RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^wells [OR]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [OR]
RewriteRule ^.* - [F]

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 1:52 pm on Jan 29, 2008 (gmt 0)

Thanks for the quick reply.
The bot I here named 'SomeBot' is being blocked.
The IP addresses I am also trying to block all begin with the same 6 digits (ie 000.000.000), only the two last digits vary. So I used
'RewriteCond %{HTTP_USER_AGENT} 000.000.000. '
in htaccess.
Should I use the name of the user agent instead? The log has Gecko/somenumber at the end of each line. I'll try using this instead (presuming this is the name :-)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3560700 posted 2:10 pm on Jan 29, 2008 (gmt 0)

IP addresses are not %{HTTP_USER_AGENT}s, they are {REMOTE_ADDR}s. :)

[added]
Also, you must escape the literal periods in your patterns to avoid them being treated as "any single character" regex tokens, and start-anchor the patterns. Not doing this would lead to possibly-dangerous ambiguities, should there be any single-digit octets in the actual address(es).

RewriteCond %{REMOTE_ADDR} ^1\.2\.3\.

For example, without anchoring and escaping the periods, the pattern shown here would match 102.3*.***.***, 112.3*.***.***, ***.102.3*.***, ***.112.3*.***, ***.***.102.3*, ***.***.112.3*, and possibly others (!)
[/added]

Jim

[edited by: jdMorgan at 2:20 pm (utc) on Jan. 29, 2008]

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 12:03 pm on Jan 31, 2008 (gmt 0)

Thanks Jim, I guess I messed that one up.
The ban now seems to work ok with the agent name.

For the future: not sure I understand correctly how to ban an IP range. would it work like this?

to block: aaa.bbb.ccc.0 to aaa.bbb.ccc.999

RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule - [F]

Thanks again

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3560700 posted 12:54 pm on Jan 31, 2008 (gmt 0)

For the future: not sure I understand correctly how to ban an IP range. would it work like this?

to block: aaa.bbb.ccc.0 to aaa.bbb.ccc.999

RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule - [F]

Thought I had an old Webmaster World bookmark which provided explantion of IP ranges and their use in the various Class groups.

Others may explain this better, however here goes.
The Class groups are not numerics, rather charcters.
The range of characters are 0-255

Using the example that you've provided would deny access (note exception below) to the 0-255 range (in Class D) or what ever Class C that you've specifed as "ccc".

Potentially, each Class (whether A, B, C or D) offers the following ranges:
[0-9]¦[1-9][0-9]¦1[0-9][0-9]¦2[0-5][0-9]

(Please note the forum breaks these pipe characters and they requiring editing before use).

broken down as follows:
0-9, 10-99, 100-199, 200-259
With the 250-259 expression being used to save expression space, even though it defines non-existent ranges (256-259).

It would be redundant to specify the entire 0-255 range a Class D expression, however it would still work.

You may break these numbers down into smaller groups for broader expressions.
EX:
aaa.bbb.ccc.[0-9] which would deny the 0-9 Class D range of the Class C that "ccc" represents.

Another example is aaa.(bbb¦bab)\.ccc.[0-9]
which would provide an OR for two different Class B ranges represented by "bbb or bab" with the Class C range being the same in both instance, while the Class D range of 0-9 is also the same in both instances.

(exception)
The folllowing is what I use, the logic has long since passed from my recollection.

RewriteRule .* - [F]

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 2:26 pm on Jan 31, 2008 (gmt 0)

wow, thanks a lot for explaining it in detail. Bookmarked this one for the future!

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 10:15 am on Feb 3, 2008 (gmt 0)

Right, I had another go, but still without success.
Still the named bot is blocked, but the ip range is not. I tried it with only the block for the ip range and also only blocking a single ip, but it always returns a 200.
Here is what I currently have in htaccess regarding this, after having made the changes according to your replies:

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\. [OR]
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

Did I overlook something? Or are there other reasons why this might not work?

You can see in the following thread what it is I am trying to block, in case it is relevant: [webmasterworld.com...]
(As this might be a search engine spider, I probably won't leave the block in my htaccess, but would really like to know what I am doing wrong here).

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3560700 posted 2:20 pm on Feb 3, 2008 (gmt 0)

Still the named bot is blocked, but the ip range is not. I tried it with only the block for the ip range and also only blocking a single ip, but it always returns a 200.
Here is what I currently have in htaccess regarding this, after having made the changes according to your replies:

Are the 200's a result of redirects to either 403 or 404 pages?
Check your visitor logs and compare files sizes for the actual requested pages to the 200's of these denied ranges.

In addition, you do not provide if these NEW Rewrites are the solitray lines in your htaccess?
As a result, I ponder if you've opened the lines with:

RewriteEngine on


RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\. [OR]
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

joergnw10,
The lines contained in my htaccess are rather simple compared to many of the inquiries and answers that I see here.
As a result, there are corners of Apache and regex that simply fail to gather my attention or even understanding.

Another will need to advise of the effect of the following line:

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$

In addition, I'm not all that sure that it's possible to combine UA's and IP ranges and achieve denail in the manner you have?

Try the following and see if two rewrites solve your issue:

#Rewrite 1
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{REMOTE_ADDR} ^aaa\.bbb\.ccc\.
RewriteRule ^.* - [F]

#Rewrite 2
RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$
RewriteCond %{HTTP_USER_AGENT} SomeBot
RewriteRule ^.* - [F]

[edited by: wilderness at 2:40 pm (utc) on Feb. 3, 2008]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3560700 posted 2:35 pm on Feb 3, 2008 (gmt 0)

You can see in the following thread what it is I am trying to block, in case it is relevant: [webmasterworld.com...]
(As this might be a search engine spider, I probably won't leave the block in my htaccess, but would really like to know what I am doing wrong here).

The person who initiated the example in this thread, never actually provided the 208 range, anther did!
Thus how would anybody realize the accuracy?

However to deny the range that was provided in the answer?

RewriteCond %{REMOTE_ADDR} ^207\.179\.(12[89]¦1[3-8][0-9]¦19[01])\.
RewriteRule .* - [F]

Correct broken pipe characters.

As an aside, that a webmaster would place emphasis on the primary contents of google analytics reports, which are created to cover a group of the masses (much the same as the stats reports that are used by host providers), without determining the location of and learning how to interpret the content of their website visitors logs?
Is an asburd idea, from, my point of view.

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 6:29 pm on Feb 3, 2008 (gmt 0)

Thanks for your replies, wilderness! I will have another go tomorrow to see if I can get it to work.
Not sure where the 207 range now comes from. My stats show those from the 208 range mentioned in the other thread. As you might have guessed I'm not exactly an expert on htaccess, blocking ip's and the like :-) Kind of freaked out when my stats got inflated by all these visits by something that to my unknowing eyes seemed to pretend to be a real visitor (also appeared as such in 'statcounter').

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3560700 posted 6:34 pm on Feb 3, 2008 (gmt 0)

"not sure where the 207 range"

My typo.
Sorry.

Two typos.
This should actually read:

RewriteCond %{REMOTE_ADDR} ^208\.111\.(12[89]¦1[3-8][0-9]¦19[01])\.
RewriteRule .* - [F]

joergnw10

5+ Year Member



 
Msg#: 3560700 posted 9:34 am on Feb 6, 2008 (gmt 0)

Thanks again wilderness!
The block of the ip range is now working. For some reason the http-viewer I'm using is still retutrning a 200, but the logs now show 403's.
It actually works with the combination of user agents and ip addresses.
The following line I received from the help in another thread - it names exceptions that the blocked agents are allowed to get - ie they should go away after beeing allowed to get robots.txt without then trying to get other files (if they obey robots.txt of course).

RewriteCond %{REQUEST_URI}!^/(errors/403\.htm¦errors/404\.htm¦errors/500\.htm¦robots\.txt)$

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3560700 posted 10:36 pm on Feb 6, 2008 (gmt 0)

If your "http viewer" is giving inaccurate results, try the "Live HTTP Headers" add-on for Firefox/Mozilla browsers. It's a free and indispensable tool for troubleshooting redirects, rewrites, caching, keep-alive, MIME-type and all other problems related to HTTP request and response headers.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved