Forum Moderators: phranque
in our logs it shows up as
101057-web2.gold.funnelback.com
and they go over every link on every page, and on one specific page, gets
caught in a bunch of keyword links, of which each trigger a query to our
database. Several times a day it's taking our site completely to 503.
I've tried to add this code to httpd.conf (we're running Apache 2 on Windows
with Tomcat)
but when I start Apache again, it refuses to go.
Is there a problem with this code below?
RewriteEngine on (this code is already there)
#RewriteCond %{HTTP_USER_AGENT} ^*.funnelback.*$ (new code which Apache
doesn't like and won't start)
#RewriteRule .* - [F,L] (new code which Apache doesn't like and won't start)
ReWriteCond %{REQUEST_METHOD} ^TRACE (this code already there)
RewriteRule .* - [F] (this code already there)
Please help!
Megan
Welcome to WebmasterWorld!
There are several problems here. The first is that you cannot "guess" mod_rewrite code. To do so is dangerous to the health of your server, as you've discovered with the failed restarts.
The specific problem is that the patterns used by mod_rewrite directives are written using regular expressions - a standard text-pattern-matching "language" used in PHP, PERL, and many other scripting languages. It in no way resembles DOS command line patterns such as "*.*" meaning "all files" in DOS. A concise regular expressions tutorial is cited in our forum charter.
In regular expressions, ".*" (Note: Not "*.") means "match any number (including zero) of any characters."
The next problem is that "funnelback" does not appear to be a user-agent. Instead, it looks like a hostname which your server has looked up using reverse DNS (RDNS). So the first step is to properly identify the user-agent (if it provides one when it accesses your site), and then figure out the correct regular expression to match that user-agent and, optionally, all of its potential versions and agent-name variants.
A minor point is that [L] used with [F] is redundant.
Note that the use of "^.*" and ".*$" in regular-expressions patterns is almost always unnecessary and wasteful of bytes on disk and CPU time. In order to understand this, you'll need to become familiar with the concept of "anchoring" in regular-expressions patterns. Patterns may be ^start-anchored, requiring the input string to start with a certain pattern. Or they may be end-anchored$, requiring the input string to end with a certain pattern. Building on that, you can use both a ^start and an end anchor$, which requires the input string to exactly match the pattern, or you may omit both anchors, which requires only that the input string contain text matching the pattern.
So, job #1 here is to review your raw server access logs, using the time of the accesses by this agent to find the relevant entries, and identify the user-agent string if there is one. Then we can proceed to implementing a solution. If this intruder does not provide a user-agent string, then other options are available, such as blocking it by IP address or by IP address range.
For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].
Jim
101057-web2.gold.funnelback.com - - [20/Dec/2006:10:08:06 -0600] "GET /url on my site" 200 53213
The referrer entries are blank for each of its requests. I'm not sure exactly what that means, are they purposely hiding this information so we can't block them?
Just to let you know too, that I am testing the rewrites on a development machine before putting them on the server. This was my last attempt after reading around on these forums, but they're still getting through.
ReWriteCond %{HTTP_REFERER} ^101057-web2.gold.funnelback.com*$
RewriteRule .* - [L]
This doesn't work either. After reading your message - and discovering that there is no referrer listed in the logs - I understand why.
What is the correct code to block by IP? If the IP isn't in the referrer logs either, will that not work? I used a host name to IP lookup website and this is the IP that it finds - 64.72.112.53
Hmm.
You'll need several changes to that code -- again, I recommend a review of the regular-expressions tutorial and the other material cited in our forum charter.
RewriteCond %{REMOTE_ADDR} ^64\.72\.112\.53$
RewriteRule .* - [F]
RewriteCond %{REMOTE_ADDR} ^64\.72\.1(1[2-9]¦2[0-7])\.
RewriteRule .* - [F]
Hopefully, that will take care of the immediate problem.
Note that you must change the broken pipe "¦" character in the code above to a solid pipe before use; Posting on this forum modifies the pipe character.
Jim
IF you use a custom 403 error document, you will need to exclude it from the rule. Otherwise, your server will generate a second 403-Forbidden response when it tries to serve the 403 error document. Example:
RewriteCond %{REQUEST_URI} !^/local_path_to_custom_403_error_document\.html$
RewriteCond %{REMOTE_ADDR} ^64\.72\.112\.53$
RewriteRule .* - [F]
ErrorDocument 403 /local_path_to_custom_403_error_document\.html
Jim
I tried to block by the IP with your suggestion (with the range of addresses) and it won't stop them.
I imagine this could be because this is what they leave in the referrer logs?
- -> /advsearch.jsp
- -> /advsearch.jsp
- -> /advsearch.jsp
without any information where there is usually information about where they come from.
I'm at a loss as what to try. Aside from holding myself back from trying to contact them directly and begging, pleading, screaming.
Megan
This is the access file
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:47:30 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Prion%20protein%20Rocky%20Mountain%20elk
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 74833
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:47:43 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Prion%20protein%20Bison%20buffalo
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 50712
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:47:56 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Prion%20protein%20Parasitic%20diseases
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 53477
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:48:15 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Prion%20protein%20Parasitology
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 54083
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:48:30 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Prion%20protein%20Viral%20diseases
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 54577
101057-web2.gold.funnelback.com - - [20/Dec/2006:11:48:45 -0600] "GET /advsearch.jsp?search=captive%20deer%20Chronic%20wasting%20disease%20Colorado%20Brain
&filter=&sortBy=1&sortDir=1&pagemode=advsearch HTTP/1.1" 200 62517
Showing 200 (success)
Here is the rewrite section in the httpd.conf
RewriteCond %{REMOTE_ADDR} ^64\.72\.1(1[2-9]¦2[0-7])\.
RewriteRule .* - [F]
And here is a referrer file snip
- -> /advsearch.jsp
- -> /advsearch.jsp
- -> /advsearch.jsp
- -> /advsearch.jsp
and we are logging "normal" referrers as well. But nothing that points to this funnelback
[edited by: jdMorgan at 6:50 pm (utc) on Dec. 20, 2006]
[edit reason] Stop side-scroll [/edit]
I just tried this after talking to a colleague and it didn't work either.
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteCond %{REQUEST_URI} ^/$
RewriteRule .* - [F]
Any blank referrer, or user agent, requesting anything under the root should be blocked, correct?
And tried this to try and specify the page itself - no dice.
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{REQUEST_URI} ^/advsearch.*$
RewriteRule .* - [F]
Note the disconnect between the description of the code you tried and the code itself:
> Any blank referrer, or user agent, requesting anything under the root should be blocked, correct?
This isn't a good idea, since it would block a lot of innocent and legitimate users coming to your site from behind ISP and corporate proxies, but in order to work at all, it needs to be written as described above:
# Any blank referrer
RewriteCond %{HTTP_REFERER} ^$ [b][OR][/b]
# [b]OR[/b] any blank user-agent
RewriteCond %{HTTP_USER_AGENT} ^$
# requesting any URL-path
RewriteRule .* - [F]
The last RewriteCond was redundant, since it was already implicit in the RewriteRule pattern, and also, that RewriteCond's pattern was fully-anchored, so the rule would have only affected requests for "example.com/" and only "example.com/" -- nothing below that. So I removed it.
Glad you got it working!
If you have access to the firewall, let your code run for a few days while observing this abuser, and then once you're sure you've got it covered, you can move the blocking function from mod_rewrite in httpd.conf to the ACL in your firewall. It will keep those requests from even connecting to your server -- and save you some space in your log files. Firewall stuff is beyond the scope of this forum, but it's something to consider/research.
Jim