Forum Moderators: phranque
Q2.If not, could someone have a quick scan?
Q3.Is there a www. "BAD BOT" database anywhere?
I have added a few more bots to the ban list, compiled from different sources:
PS. anyone copying & pasting the code, if you have access to httpd.conf use that instead of htaccess. Also remove all notations, to thin out the file (keep a copy with notations for ref).
# Handlers (yours may be different)
#========================================================
AddHandler application/x-httpd-php5 .php
# Deny All Indexing On Folders (for security)
#========================================================
Options All -Indexes
# Error Document Handlers (yours may be different)
#========================================================
ErrorDocument 400 /error.php?400
ErrorDocument 401 /error.php?401
ErrorDocument 403 /error.php?403
ErrorDocument 404 /error.php?404
ErrorDocument 500 /error.php?500
# Deny Access To The .htaccess File (for security)
#========================================================
<Files .htaccess>
order allow, deny
deny from all
</Files>
# Allow Access For All To The .403 Handler (yours may be different)
#========================================================
<Files 403.shtml>
order allow, deny
allow from all
</Files>
RewriteEngine On
# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
RewriteRule .* - [F]
# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]
# Banning BOTS bellow
# High Priority Here
RewriteCond %{REMOTE_ADDR} g^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$h [NC,OR] # Cyveillance
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR] # NameProtect
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR] # NameProtect
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [NC,OR] # Turnitin
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC] # BAD Spider/Bot
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Collector¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
# ...
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR] # (A Fom Mail Attacker?)
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR]
# Use Either End Rules
# RewriteRule .* - [F]
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule!^http://[^/.]\.your-site.com.* - [F]
[edited by: jdMorgan at 5:57 pm (utc) on Jan. 3, 2008]
[edit reason] Edited for links and length [/edit]
Q1.I'm just finalizing my htaccess file, and wondered does anyone have a script that can check it to make sure the syntax/code is correct?
For a syntax check, upload the file to a test subdomain or subdirectory on your site, and then request a URL from that subdomain/subdirectory. There are some 'checkers' on the Web, but none that I would trust completely. That makes them just a delaying factor in getting to 'real testing' on the server.
Q2.If not, could someone have a quick scan?
Q3.Is there a www. "BAD BOT" database anywhere?
I don't recommend using (or even attempting to build) a comprehensive list of all "bad-bots." This will lead to a huge .htaccess file over time -- Slowing down each end every HTTP request to your server. Some of the most troublesome bad-bots on the Web now use standard browser user-agents, rendering about half of your list obsolete at this time. Review you own server log files, and block only the bad-bots that constitute a real problem for (or threat to) your site.
A few comments on the code snippets follow.
# Deny All Indexing On Folders (for security)
#========================================================
Options All -Indexes
# Error Document Handlers (yours may be different)
#========================================================
ErrorDocument 400 /error.php?400
ErrorDocument 401 /error.php?401
ErrorDocument 403 /error.php?403
ErrorDocument 404 /error.php?404
ErrorDocument 500 /error.php?500
A 400-Bad Request error indicates that the client is broken, and very unlikely to be a human-driven browser. So I wouldn't bother with a custom error page for that one, either.
Intentionally-removed URLs should return 410-Gone. No handler is present for than one...
# Deny Access To The .htaccess File (for security)
#========================================================
<Files .htaccess>
order allow, deny
deny from all
</Files># Allow Access For All To The .403 Handler (yours may be different)
#========================================================
<Files 403.shtml>
order allow, deny
allow from all
</Files>
# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD} !^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Banning BOTS bellow
# High Priority Here
# Cyveillance
RewriteCond %{REMOTE_ADDR} g^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$h [NC,OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR]
# NameProtect
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR]
# Turnitin
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [NC,OR]
# ...
# (A Fom Mail Attacker?)
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR]
# Use Either End Rules
# RewriteRule .* - [F]RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
I suggest adding an exclusion to all rules at the top of your rules to *allow* the 403 error page (along with any objects that it includes, e.g. images, CSS files) and robots.txt file to be fetched unconditionally. Without this, any 403 will lead to an infinite loop, and robots unable to fetch your robots.txt file may legitimately assume that they are allowed to fetch your entire site.
RewriteCond %{QUERY_STRING} ^403$
RewriteRule ^error\.php$ - [L]
RewriteRule ^robots\.txt$ - [L]
Jim