homepage Welcome to WebmasterWorld Guest from 54.227.146.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Help needed for rDNSbot plus block IP,s
whatsdoin




msg:3411257
 8:53 am on Aug 2, 2007 (gmt 0)


Gang I need help...I recently followed a link from Incredibills.blogspot,and found a great couple of threads talking about a subject,which had aroused my suspicions in the hosting back office..regarding site hijacking and fake robots and scrapers.
Thanks to the guys of those threads i organised my .htaccess files and.. surprise my my real traffic and views have been steadily on the rise.

However guys after putting in this code below in .htaccess. from those particular threads..
It works great..but for some reasons i cannot block individual IP's.
The ip,s I want to block and put in .htacces still continue to crawl my site..what can i do to rectify this?

many thanks

here is exactley my .htaccess
=========================================
AddHandler application/x-httpd-php5 .php
<Files 403.shtml>
order allow,deny
allow from all
</Files>
<FilesMatch "\.(s?html?¦php[45]?)$">
#
BrowserMatchNoCase Googlebot rDNSbot
BrowserMatchNoCase msnbot rDNSbot
BrowserMatchNoCase Slurp rDNSbot
BrowserMatchNoCase Teoma rDNSbot
#
SetEnvIf Request_URI "/path-to-custom-403-page\.shtml$" AllowAll
#
Order Deny,Allow
Deny from env=rDNSbot
Allow from env=AllowAll
Allow from googlebot.com
Allow from search.live.com
Allow from crawl.yahoo.net
Allow from ask.com
Allow from inktomisearch.com

</FilesMatch>
deny from 38.99.44.99
deny from 72.36.94.152
deny from 128.174.254.29
deny from 216.95.221.39
deny from 213.232.196.107
deny from 81.172.95.166
deny from 66.249.17.251
============================================

 

wilderness




msg:3411437
 1:59 pm on Aug 2, 2007 (gmt 0)

Unfortunately and in this instance, you cannot have the best of two worlds (options)

deny, allow
[httpd.apache.org...]
or
allow, deny
[askapache.com...]

some more reading
[webmasterworld.com...]
[webreference.com...]

When implementing "white-listing" and the deny,allow option, it becomes necessary to deny IP ranges with Rewrite.

RewriteEngine on (if not already on)
RewriteCond %{REMOTE_ADDR} ^38\.99\.44\.99 [OR]
RewriteCond %{REMOTE_ADDR} ^72\.36\.94\.152
RewriteRule .* - [F]

Please note; Your going to learn very fast that limiting your denies to a solitary Class D range will come back very quickly to bite you in the backside.
A more effective action is to deny with the following options:
1) a range in the Class D
2) The entire Class D
3) The entire range of the provider your denying.

jdMorgan




msg:3411610
 4:15 pm on Aug 2, 2007 (gmt 0)

wilderness is correct, you cannot have two Order directives in an .htaccess file, unless they are within containers ( such as <Files> or <Limit> ) that make them mutually-exclusive. If two Order directives are not mutually-exclusive, then only the last one found will be used.

However, the problem with your code is likely that the second Order directive is scoped only for the files specified in the <FilesMatch> container, and the server is therefore probably defaulting to Deny Allow with an Allow from all inherited from the server configuration file.

This can be fixed by moving the <FilesMatch> opening and closing tags, along with a few unrelated tweaks:

AddHandler application/x-httpd-php5 .php
#
# Remove the following four lines -- They are redundant. Put the
# URL-path of your 403.shtml page into the bolded line below
# <Files 403.shtml>
# order allow,deny
# allow from all
# </Files>
#
# If my guess shown here is incorrect, put the correct
# local URL-path to your custom 403 page into this line
[b]SetEnvIf Request_URI "/403\.shtml$" AllowAll[/b]
#
Order Deny,Allow
Allow from env=AllowAll
#
<FilesMatch "\.(s?html?¦php[45]?)$">
#
BrowserMatchNoCase Googlebot rDNSbot
BrowserMatchNoCase msnbot rDNSbot
BrowserMatchNoCase Slurp rDNSbot
BrowserMatchNoCase Teoma rDNSbot
#
Deny from env=rDNSbot
Allow from googlebot.com
Allow from search.live.com
Allow from crawl.yahoo.net
Allow from ask.com
Allow from inktomisearch.com
#
</FilesMatch>
#
Deny from 38.0.0.0/8
Allow from 38.114.104.0/24
Deny from 72.36.94.152
Deny from 128.174.254.29
Deny from 216.95.221.0/24
Deny from 213.232.196.0/24
Deny from 81.172.95.166
Deny from 66.249.0.0/19

I should note that the <FilesMatch> container is not required. Its purpose is to prevent the rDNS checks from being done on every single HTTP request to your server. These rDNS checks are expensive in terms of server performance, so only the filetypes listed in the <FilesMatch> are checked. This approach is based on the presumption that if they can't fetch your pages, then they can't find your images, CSS, etc.

When I posted the original code, I had no expectation of making a one-size-fits-all, cut-n-paste solution, so it's likely that everyone will need to tweak the code to suit their site.

Replace all broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

whatsdoin




msg:3412105
 11:50 pm on Aug 2, 2007 (gmt 0)

Man you guys are the BOMB..thanks for the help and taking time out in to answer my questions...i Appreciate that..and hopefully can extend my learnings to others in life as you have when the time arises.
my best.
p.s
Thanks JD yours was the code that helped me most,which you correctly picked i cut and pasted :)

[edited by: whatsdoin at 11:57 pm (utc) on Aug. 2, 2007]

whatsdoin




msg:3412106
 11:52 pm on Aug 2, 2007 (gmt 0)

Just as a follow on guys "Live search" bot seems to be ok with the script but it seems to block MSN bot itself....when i check if it is not spoof bot it is actually msn..is this normal?

jdMorgan




msg:3412271
 2:20 am on Aug 3, 2007 (gmt 0)

No, not normal. But msnbot may be doing something different from when I last looked at its rDNS -- Update the code to add or change the msnbot reverse-DNS hostname to match whatever they're using today. (Please post and let us know, too)

Jim

whatsdoin




msg:3413603
 2:37 pm on Aug 4, 2007 (gmt 0)

Thanks guys ..i must state it is all chinese to me but i have tried this..and still no go...I think you guys are right canot have best of both worlds...particualrly 213.232.196.107 is giving me the shats from russia...no agent no bot nothing
================
AddHandler application/x-httpd-php5 .php
SetEnvIf Request_URI "/path-to-custom-403-page\.shtml$" AllowAll
#
Order Deny,Allow
Allow from env=AllowAll
#
<FilesMatch "\.(s?html?¦php[45]?)$">
#
BrowserMatchNoCase Googlebot rDNSbot
BrowserMatchNoCase msnbot rDNSbot
BrowserMatchNoCase Slurp rDNSbot
BrowserMatchNoCase Teoma rDNSbot
#
Order Deny,Allow
Deny from env=rDNSbot
Allow from googlebot.com
Allow from search.live.com
Allow from crawl.yahoo.net
Allow from ask.com
Allow from inktomisearch.com
#
</FilesMatch>
#
deny from 38.99.44.99
deny from 128.174.254.29
deny from 216.95.221.39
deny from 213.232.196.107
deny from 81.172.95.166
deny from 66.249.17.251
deny from 72.36.94.152
deny from 209.128.83.201

=============================

wilderness




msg:3413633
 5:00 pm on Aug 4, 2007 (gmt 0)

I use Rewrite with white-listing (as previously provided) and it functions.

Suggest you replace the followin two lines
deny from 38.99.44.99
deny from 213.232.196.107

with the following:

RewriteEngine on (if not already on)
RewriteCond %{REMOTE_ADDR} ^38\. [OR]
RewriteCond %{REMOTE_ADDR} ^213\.232\.196\.
RewriteRule .* - [F]

whatsdoin




msg:3414638
 8:19 am on Aug 6, 2007 (gmt 0)

wilderness & jdMorgan..thanks very much for your help.
JD asked me to keep all informed..so here goes.I tinkered around a little..

JD as stated before Live search is ok with the set up but Msn bot is not..i will give you copy here down below to look.

This is my current .htaccess,and thanks to you guys so far so good I hope it also helps someone else..it has even blocked 213.232.196.107..yipeee
I will show you in a minute.
==============================================
AddHandler application/x-httpd-php5 .php
#
RewriteEngine on
#
SetEnvIf Request_URI "/path-to-custom-403-page\.shtml$" AllowAll
#
Order Deny,Allow
Allow from env=AllowAll
#
<FilesMatch "\.(s?html?¦php[45]?)$">
#
BrowserMatchNoCase Googlebot rDNSbot
BrowserMatchNoCase msnbot rDNSbot
BrowserMatchNoCase Slurp rDNSbot
BrowserMatchNoCase Teoma rDNSbot
#
Order Deny,Allow
Deny from env=rDNSbot
Allow from googlebot.com
Allow from search.live.com
Allow from crawl.yahoo.net
Allow from ask.com
Allow from inktomisearch.com
#
</FilesMatch>
#
RewriteCond %{REMOTE_ADDR} ^38\. [OR]
RewriteCond %{REMOTE_ADDR} ^213\.232\.196\.
deny from 128.174.254.29
deny from 216.95.221.39
deny from 81.172.95.166
deny from 66.249.17.251
deny from 72.36.94.152
deny from 209.128.83.201
#
RewriteRule .* - [F]
======================================

-script above blocking 213.232.196.107

Host: 213.232.196.107 /permalink.php?article=**********.txt
Http Code: 403 Date: Aug 05 03:33:07 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: -


/permalink.php?article=*****.txt
Http Code: 403 Date: Aug 05 04:44:05 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: -


/permalink.php?article=*******.txt
Http Code: 403 Date: Aug 05 10:06:02 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: -

------------------------------

Live search =

livebot-65-55-208-184.search.live.com /permalink.php?article=*********.txt
Http Code: 200 Date: Aug 05 15:22:38 Http Version: HTTP/1.0 Size in Bytes: 40367
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)


/permalink.php?article=******.txt
Http Code: 200 Date: Aug 05 15:22:38 Http Version: HTTP/1.0 Size in Bytes: 33187
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)


/permalink.php?article=******.txt
Http Code: 200 Date: Aug 05 15:22:39 Http Version: HTTP/1.0 Size in Bytes: 40304
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)
----------------------------------

And now this..i have checked the ip and it of course belongs to MSN..so Live search is OK but this is not.?

65.54.188.90
Http Code: 403 Date: Aug 05 07:46:50 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)



Http Code: 403 Date: Aug 05 10:37:10 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)



Http Code: 403 Date: Aug 05 14:49:46 Http Version: HTTP/1.0 Size in Bytes: -
Referer: -
Agent: msnbot/1.0 (+http://search.msn.com/msnbot.htm)
----------------------
once again thank you Both..I will monitor to see,and if all OK after a few days with the htaccess + rewrite let you guys know...but MSN i dont know whats happening.

wilderness




msg:3414875
 2:50 pm on Aug 6, 2007 (gmt 0)

RewriteCond %{REMOTE_ADDR} ^38\. [OR]
RewriteCond %{REMOTE_ADDR} ^213\.232\.196\.
deny from 128.174.254.29
deny from 216.95.221.39
deny from 81.172.95.166
deny from 66.249.17.251
deny from 72.36.94.152
deny from 209.128.83.201
#
RewriteRule .* - [F]

the above is NOT correct.

The lines I suggested go into an entirely different section of your htaccess.
1) after all deny froms
2) after close mod lines EX:
a) allow from all
deny from env=keep_out
</Limit>

3) then add
RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^38\. [OR]
RewriteCond %{REMOTE_ADDR} ^213\.232\.196\.
RewriteRule .* - [F]

wilderness




msg:3414900
 3:10 pm on Aug 6, 2007 (gmt 0)

In addition, each time you make modifications to your htaccess , it's a good idea to visit your website (s) to confirm that you have not initiated a syntax error (which you have) that will prevent you site (s) from functioning and providing all visitors with a 500 error.

wilderness




msg:3414915
 3:25 pm on Aug 6, 2007 (gmt 0)

I use Rewrite with white-listing (as previously provided) and it functions.

Suggest you replace the followin two lines
deny from 38.99.44.99
deny from 213.232.196.107

with the following:

RewriteEngine on (if not already on)
RewriteCond %{REMOTE_ADDR} ^38\. [OR]
RewriteCond %{REMOTE_ADDR} ^213\.232\.196\.
RewriteRule .* - [F]

whatsdoing,
My apologies for the confusion.
Your misunderstanding of the addition is due my use of the word replace even though the intent of replace is the same, the replace is done in a different section of the file.

Don

whatsdoin




msg:3415501
 3:45 am on Aug 7, 2007 (gmt 0)

Mate please dont apologise..it should me that apologises for the nob questions..thanks for putting up with me.

I will now try what you said.
You are correct about the 500's just one slip in the formula seems to trigger it off.

Key_Master




msg:3415509
 3:55 am on Aug 7, 2007 (gmt 0)

65.54.188.90 reverse resolves to msnbot.msn.com

:):):):):)

jdMorgan




msg:3415741
 12:58 pm on Aug 7, 2007 (gmt 0)

Thanks Key_Master,

OK, so that makes the Allow list:

Allow from googlebot.com
Allow from search.live.com
Allow from msnbot.msn.com
Allow from crawl.yahoo.net
Allow from ask.com
Allow from inktomisearch.com

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved