Forum Moderators: phranque

Message Too Old, No Replies

url and agent blocking in httpd.conf with RewriteCond

         

bbxrider

3:54 am on Nov 12, 2010 (gmt 0)

10+ Year Member



for a windows based site, with 6ish sites, trying to block unwanted's at the server level, so don't have to duplicate .htaccess files, so the idea is to put it in a httpd.conf directory section.
is the basic syntax correct?
other suggestions for a better RewriteRule?
is my HTTP_USER_AGENT list up to date?

<Directory C:/APACHE2/HTDOCS>
Options FollowSymLinks +ExecCGI -Indexes

wOverride None
Order allow, deny
Allow from all
Satisfy all Options -Indexes

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_REFERER} ^-?$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC]
#BLOCK ALL 109 FIRST OCTET
RewriteCond %{REMOTE_ADDR} ^109\. [OR]
#BLOCK ALL 119 FIRST OCTET
RewriteCond %{REMOTE_ADDR} ^119\. [OR]
#THERE WILL BE MANY MORE BLOCKS IN ADDITION TO THESE HERE SO FAR
RewriteCond %{REMOTE_ADDR} ^216\.169\.111\.

RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule ^.* - [F]
#RewriteRule !^http://[^/.]\.your-site.com.* - [F]
</Directory>

wilderness

6:35 pm on Nov 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many of these UA's are outdated.

EX:
CherryPicker [google.com]

I've never used httpd.conf, and my experience is limited to shared hosting and htaccess.

You need to grasp some of the simplicity of regex, and that may be done in the forum libray [webmasterworld.com].

You should understand the statements that utilize:
Begins with (^)
contains
ends with ($)

In addition, all UA's that you've provided as an example might be condensed into a solitary line.

As far as an extensive list of UA's?
No such thing exists!
Each webmaster must "determine" what is beneficial or detrimental to their own website (s).

wilderness

6:48 pm on Nov 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Additionally and FWIW, blacklisting or whitelisting is more readily discussed in the SSID Forum [webmasterworld.com] and its archives.

BTW, the following line and unless you utilize it without "multiple conditions" will deny many, many innocents of which you never intended.

RewriteCond %{HTTP_REFERER} ^-?$ [NC]

bbxrider

4:12 am on Nov 14, 2010 (gmt 0)

10+ Year Member



thanks for the reply, if I can get a couple clarifications.....
so any line that has a statement that starts with a ^ needs to end with a $, yes?
so this line
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
needs to be written like this?
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL$ [OR]
the $ comes before the [OR], yes?

I've seen a fair amount of examples and none of them ended with $ so its a somewhat confusing

you mention the unintended consequences of this line
RewriteCond %{HTTP_REFERER} ^-?$ [NC]
I'm not sure with the way I used it is invoking the unintended consequences? not sure what you mean by 'multiple conditions'

what I just realized, is just how the
RewriteCond %{HTTP_REFERER} directives work?
my example has 2, 1 after the UA's and 1 after the blocked URL'S.
My understanding at this point would be that they apply to all the preceding rewrites, until another preceding 'REFERER' directive is encountered?
so in my example the UA's get handled with
RewriteCond %{HTTP_REFERER} ^-?$ [NC] (which may have unintended blocks?)
and the blocked URL's get handled with
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
yes?

wilderness

5:16 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First off!
You must not forget that you have to escape all periods (and many other characters) in RewriteCond lines.
EX:
folder.example.com would be listed as
folder\.example\.com

Onto your confusion. . .BTW this explanations have been offered hundreds, if not thousands of times at Webmaster World:

"begins with" [google.com]

"ends with" [google.com]

You would only use the ^(begins with) and $ (ends with) statements together if that is your specific intention. (generally applied in this manner for very short (even blank) UA's.

Two applications of your example that would work on the same UA (at least and in the example (IF complete UA) that you provided:

#begins with Microsoft
RewriteCond %{HTTP_USER_AGENT} ^Microsoft [OR]

#Ends with URL
RewriteCond %{HTTP_USER_AGENT} URL$ [OR]

Or you may simply do a contains either:

#note absence of both leading ^ and/or trailing $
RewriteCond %{HTTP_USER_AGENT} (Microsoft|URL) [OR]

(Please note; not sure if the forum still breaks the pipe character. If it continues to do that correction of the broken character will be required).

---------

RewriteCond %{HTTP_REFERER} ^-?$ [NC]
I'm not sure with the way I used it is invoking the unintended consequences? not sure what you mean by 'multiple conditions'


Once again and using the line you supplied as an example of multiple conditions:

# Multiple Condition criteria; Refer, UA and IP all required
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{REMOTE_ADDR} ^109\.

what I just realized, is just how the
<snip>
yes?


No.
If you have those 28-29 lines listed in consecutive order and within your htaccess?

If so?
Your asking for multiple condition/criteria to be met before any action(missing from your lines).
1) Your asking for one of the UA's to required
2)both blank refer and UA (blank UA in conflict)
3) finally, requiring lines 1 & 2 to come from one of the IP's you've designated.

If you actually have the statements separated by something such as:

RewriteRule .* - [F]

And just omitted that in your example?
My apologies for the confusion.

Hope this helps.

Don




wilderness

5:32 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As an afterthought I considered how confusing my last (If So?) explanation may seem.

To save space I've omitted many of your lines and these lines are exactly as you initially supplied (with the exception of the corrected escapes for the internetseer):

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck\.internetseer\.com
RewriteRule .* - [F]
RewriteCond %{HTTP_REFERER} ^-?$
RewriteRule .* - [F]
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]
#BLOCK ALL 109 FIRST OCTET
RewriteCond %{REMOTE_ADDR} ^109\. [OR]
#BLOCK ALL 119 FIRST OCTET
RewriteCond %{REMOTE_ADDR} ^119\. [OR]
#THERE WILL BE MANY MORE BLOCKS IN ADDITION TO THESE HERE SO FAR
RewriteCond %{REMOTE_ADDR} ^216\.169\.111\.
RewriteRule .* - [F]

bbxrider

6:16 pm on Nov 16, 2010 (gmt 0)

10+ Year Member



many, many thanks for your super reply and all the work that went into it, still have to study up some more here to finish. will be researching the rewrite rules, if it makes sense to have different ones for url blocks vs ua's - some posts suggest different approaches to better discourage the spammers, like sending them to blackhole websites, like this one post

Or, instead of delivering a friendly error message (i.e., the last line), send these bad boys to the hellish website of your choice by replacing the RewriteRule in the last line with one of the following two examples:
# send em to a hellish website of your choice
RewriteRule ^.*$ [hellish-website.com...] [R,L]
(don't know yet what this would be)

Or, to send em to a virtual blackhole of fake email addresses:

# send em to a virtual blackhole of fake email addresses
RewriteRule ^.*$ [english-61925045732.spampoison.com...] [R,L]

wilderness

6:40 pm on Nov 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, you will need to keep in mind that my references are to "htaccess", while your initial inquiry was regarding the use of "httpd.conf", there is some difference in how the syntax is presented, although I've no background in using httpd.conf.

Passing on "pests" to other websites and/or presenting them with some sort of challenge (which could initiate more of their activity at your website (s)) is a bad practice (at least from an administrational point-of-view).

It's simply more effective to present them with a simple 403.

I recall Jim and others having a practice of presenting a downsized 403-page to these types of constant pests, in which the "downsized-403" offers zero or very small kb's.

Also recall another whom mentioned presenting these types of pests with 500-server responses, which leads the bot to believe the site is down or malfunctioning.

jdMorgan

9:37 pm on Nov 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... and some scraper-bots can be made to 'go away' by giving them a short 200-OK response and no real useful content.

Do not redirect your problems elsewhere -- That is really bad nettiquette, and wastes internet bandwidth. With the sole exception of redirecting an abuser to his own ISP's Terms Of Use page (which is rarely, but sometimes useful), you should really attempt to get rid of the pests directly using 403-Forbidden, 404-Not Found, 500-Server Error, or 200-OK responses - whatever works (in getting that pest to leave) and costs you the least total bandwidth per abuser.

See the mod_rewrite and regular expressions resources cited in our Apache Forum Charter for useful information on this project.

Jim