Forum Moderators: phranque

Message Too Old, No Replies

.htaccess checker?

.htaccess file checker

         

momo77

4:21 pm on Jan 3, 2008 (gmt 0)

10+ Year Member



Q1.I'm just finalizing my htaccess file, and wondered does anyone have a script that can check it to make sure the syntax/code is correct?

Q2.If not, could someone have a quick scan?

Q3.Is there a www. "BAD BOT" database anywhere?

I have added a few more bots to the ban list, compiled from different sources:

PS. anyone copying & pasting the code, if you have access to httpd.conf use that instead of htaccess. Also remove all notations, to thin out the file (keep a copy with notations for ref).

# Handlers (yours may be different)
#========================================================
AddHandler application/x-httpd-php5 .php

# Deny All Indexing On Folders (for security)
#========================================================
Options All -Indexes

# Error Document Handlers (yours may be different)
#========================================================
ErrorDocument 400 /error.php?400
ErrorDocument 401 /error.php?401
ErrorDocument 403 /error.php?403
ErrorDocument 404 /error.php?404
ErrorDocument 500 /error.php?500

# Deny Access To The .htaccess File (for security)
#========================================================
<Files .htaccess>
order allow, deny
deny from all
</Files>

# Allow Access For All To The .403 Handler (yours may be different)
#========================================================
<Files 403.shtml>
order allow, deny
allow from all
</Files>

RewriteEngine On

# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
RewriteRule .* - [F]

# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]

# Banning BOTS bellow
# High Priority Here
RewriteCond %{REMOTE_ADDR} g^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$h [NC,OR] # Cyveillance
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR] # NameProtect
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR] # NameProtect
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [NC,OR] # Turnitin
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC] # BAD Spider/Bot
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Collector¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
# ...
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR] # (A Fom Mail Attacker?)
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR]

# Use Either End Rules
# RewriteRule .* - [F]

RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule!^http://[^/.]\.your-site.com.* - [F]

[edited by: jdMorgan at 5:57 pm (utc) on Jan. 3, 2008]
[edit reason] Edited for links and length [/edit]

jdMorgan

6:35 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Q1.I'm just finalizing my htaccess file, and wondered does anyone have a script that can check it to make sure the syntax/code is correct?

I have edited this post to comply with our charter. We focus on specific questions and problems here, and it's not likely that anyone will have the time to do a line-by line review of large code dumps. Even the most generous contributors here would likely consider that to be billable 'consulting.' :)

For a syntax check, upload the file to a test subdomain or subdirectory on your site, and then request a URL from that subdomain/subdirectory. There are some 'checkers' on the Web, but none that I would trust completely. That makes them just a delaying factor in getting to 'real testing' on the server.

Q2.If not, could someone have a quick scan?

In its reduced form, yes.

Q3.Is there a www. "BAD BOT" database anywhere?

There are several 'identify this user-agent' sites available. However, it's up to you to decide what is good and what is bad. For example, many Webmasters block certain African countries know for sending copious quantities of "Share X Million Dollar" spam e-mails. That's fine, *unless* you happen to be a Webmaster in that country and would like some traffic to your site... Simplistic example, but I think it makes the point. :)

I don't recommend using (or even attempting to build) a comprehensive list of all "bad-bots." This will lead to a huge .htaccess file over time -- Slowing down each end every HTTP request to your server. Some of the most troublesome bad-bots on the Web now use standard browser user-agents, rendering about half of your list obsolete at this time. Review you own server log files, and block only the bad-bots that constitute a real problem for (or threat to) your site.

A few comments on the code snippets follow.

# Deny All Indexing On Folders (for security)
#========================================================
Options All -Indexes

I'd suggest adding -MultiViews if you are not using content-negotiation.

# Error Document Handlers (yours may be different)
#========================================================
ErrorDocument 400 /error.php?400
ErrorDocument 401 /error.php?401
ErrorDocument 403 /error.php?403
ErrorDocument 404 /error.php?404
ErrorDocument 500 /error.php?500

A failure of php or of your /error.php script will lead to an 'infinite' loop of 500-Server Errors -- I strongly suggest using the default handler for 500-Server Error.

A 400-Bad Request error indicates that the client is broken, and very unlikely to be a human-driven browser. So I wouldn't bother with a custom error page for that one, either.

Intentionally-removed URLs should return 410-Gone. No handler is present for than one...

# Deny Access To The .htaccess File (for security)
#========================================================
<Files .htaccess>
order allow, deny
deny from all
</Files>

# Allow Access For All To The .403 Handler (yours may be different)
#========================================================
<Files 403.shtml>
order allow, deny
allow from all
</Files>


You could use a single mod_rewrite rule or the <FilesMatch> container with a local OR to reduce the size of the code above.

# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD} !^(GET¦HEAD¦POST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]

RewriteCond %{REQUEST_URI} \.printer$ [NC,OR]
would be equivalent and shorter

# Banning BOTS bellow
# High Priority Here
# Cyveillance
RewriteCond %{REMOTE_ADDR} g^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$h [NC,OR]

Spurious character "g" at start of pattern?

RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR]
# NameProtect
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [NC,OR]
# Turnitin
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [NC,OR]
# ...
# (A Fom Mail Attacker?)
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR]

Missing [OR] on second-to-last RewriteCond and spurious [OR] on last RewriteCond are both 'fatal' coding errors.

# Use Either End Rules
# RewriteRule .* - [F]

RewriteRule ^.* - [F,L]


[L] used with [F] is redundant. The version commented-out above will suffice

RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

This rule will fail even with "your-site" because RewriteRule cannot 'see' the hostname, only the local URL-path.

I suggest adding an exclusion to all rules at the top of your rules to *allow* the 403 error page (along with any objects that it includes, e.g. images, CSS files) and robots.txt file to be fetched unconditionally. Without this, any 403 will lead to an infinite loop, and robots unable to fetch your robots.txt file may legitimately assume that they are allowed to fetch your entire site.


RewriteCond %{QUERY_STRING} ^403$
RewriteRule ^error\.php$ - [L]
RewriteRule ^robots\.txt$ - [L]

One more comment -- on comments, actually: Do not put comments in-line with your directives. This is a "warning-level" syntax error, and will result in a warning being logged by the server every single time that line is parsed. (!) Always put comments on a separate line.

Jim

momo77

1:31 am on Jan 9, 2008 (gmt 0)

10+ Year Member



Thanks very much Jim

thats been a huge help,

i think i understand everything you have mentioned