Forum Moderators: phranque
The intent is to redirect the log spammer back to the site that they are spamming.
Is this correct and what are pros and cons?
Thanks in advance.
RewriteEngine On
RewriteCond %{HTTP_HOST}!^example.com$ [NC]
RewriteCond %{HTTP_REFERER} ^(.*)$ [NC]
RewriteRule ^(.*)$ %1 [R=301,L]
You might want to change the pattern to (http://[^/]+) to leave off the 'tail' following the domain name, if any.
Jim
[edited by: jdMorgan at 12:48 am (utc) on Aug. 5, 2005]
Thanks for all your contributions in the Spider Identification World! It's been a big help.
I think you would be more effective with:
RewriteEngine On
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} (badsite\.com¦otherbadsite\.com) [NC]
RewriteRule . http://%1%2 [R=301,L]
Rule: check every page that contains a character.
Cond1: Check to see that there is a referrer - If not nothing we can do, so the rule should fail.
Cond2: Check to see if the referrer is from the spammer site and if it is redirect them... By not haveing a hard start ^ or ending $ we can use the implicit anything up to and anything else - should save some time.
No real point in checking for your site as a referrer, because we will have to check all external referrals against the bad list anyway, so I can't see how we would get much time/processor savings by checking for a match from your site - If most of your referrals are internal, then you should use:
RewriteCond %{HTTP_REFERER}!^(www\.)?yoursite\.com
Between the two conditions, but remember, every off site referral, including SEs will have to be checked against yoursite.com, before we even check to see if it is a bad referral, so if most of your visitors come from an external URL, this will actually add to the processing time/resources necessary for the checks.
The Cons: You will put a load on your server, and will need to check every request to be effective... The biggest problem I see is you will have to check every referrer against a 'bad-referrer' list, because otherwise, you will lose any inbound referrals from other sites and I am sure this is not your goal. You could prabably reasonably do this for a few 'bad' sites, but that will depend on the number of visitors and the number of 'bad' referrers you would like to block.
It will also depend to some degree on user behavior, if they tend to hang around your site, then the check for yoursite.com as a referrer would be useful and could significantly help your server load.
You will also loose any actual visitors that make their way to your site from here - Of course, if they are just log spamming, then there is nothing to worry about in terms of visitors.
So, I guess in short, the biggest con I see is that you will have to check every referrer for an exact match against a bad list, and this will require some processing.
Pro's: log stats should be better, but I do not know if the logs will still register this as a referral -
Sure would be fun!
If you decide to do it, there are some things that could be used to shorten the checks of domains EG you could use a condition that checks the first letter of the domains and breaks if they don't match -
Bad list:
baddomain otherbaddomain somebaddomain theotherbaddomain
RewriteCond %{HTTP_REFERER} (b¦o¦s¦t)(t?her?¦ome)(other)?(addomain) [NC]
This will fail if the first letters of the referrer do not match the bad domains, but continue if they do, obviously, the variables in the redirect need to be adjusted, but it should give some ideas. (I wrote this kind of quick, so I am not sure about the optionals.)
Hope this helps.
Justin
Added: Just another thought, maybe if jdMorgan makes it all the way through this one he can comment.
Could we turn the referrer log off in the first line of the .htaccess and then turn it back on after the checks on the referrer? I don't have time to test it right now, spent too much time here already today!
I usually don't bother with the log spammers, however this one is different.
In the past week or so they have been appearing at least hourly.
Today I denied the IP they had been using and the response was approx 35 more with five minutes and from 35 different IP's.
I denied on the refer, however that doesn't seem to have slowed down the hourly visits.
BTW, the page they were coming in on is longer on my site and I had a redirect in tact to the new location of that page.
I'm not thrilled about passing this kind of traffic to domain that's not mine.
Don
As jd01 says, if you do the check for the log spammer referrer before redirecting the page to another site, that should prevent any problems for the other site.
But I'd recommed that you redirect the now-gone page to a page on your own site, and then redirect from that URL to the off-site URL. Then check your logs to see if the log-spammer follows the first redirect and comes right back for the second URL. It's very rare that they do. As I said, their entire goal is to put an entry in your log file. No matter what you do in .htaccess, or what your server response is, that goal will be accomplished. They can do a GET, PUT, POST, HEAD, whatever, follow redirects or not -- As long as they create a log entry, they have accomplished their mission.
Since I'm familiar with several of your security measures, I doubt that your log files or stats are public. But these log spammers have you on their list anyway, and they don't care. If they get a public logs/stats listing in one out of 10,000 sites, they're happy.
Nothing says you can't report their IP addresses to the several useful IP address block list sites, though. That actually seems to be the most useful thing to do, unless you can block them at the server router.
With httpd.conf access and mod_log_config, you could skip logging these accesses based on referer, but that still does not stop them wasting your CPU or internet bandwidth. That's why most counter-measures are futile, in my opinion. In your shoes, I'd just report them and go on.
Jim
I've been going through my file since early this AM looking for syntax errors :( after a while all the numbers and characters look-a-like.
These past few hours (after having found five syntax errors on the three previous run-throughs) have been spent in dazzled state of mind wondering why the corrections created more entries contained in the rewrites.
Two more run throughs with no apparent errors.
Alas! In haste to handle the pest of this thread reference?
I failed in escaping a dot and a colon in the SetEnv section which invalidated my entire file.
Who the hell said this stuff was easy ;)
Thanks again
Don.
This uses mod_setenvif [httpd.apache.org].
The SetEnvIF lines set an environment variable based on the request uri. If it matches any of the regular expressions, it sets the attacks variable. In the CustomLogs entries, if the environment variable attacks is set, the record is added to my attacks_log file, otherwise, it goes to my normal access_log.
#
# CodeRed and Nimda to seperate logfiles
#
SetEnvIF Request_URI "^/default.ida(.*)$" attacks
SetEnvIF Request_URI "^/wpad.dat(.*)$" attacks
SetEnvIF Request_URI "root\.exe(.*)$" attacks
SetEnvIf Request_URI "cmd\.exe(.*)$" attacks
SetEnvIf Request_URI "\.dll(.*)$" attacks
CustomLog logs/attacks_log combined env=attacks
CustomLog logs/access_log combined env=!attacks
I have not tested the following entries but based on RFC2616 [rfc-editor.org] and the aboved mentioned Apache [httpd.apache.org] docs, are assumed to work.
#
# CodeRed and Nimda to seperate logfiles
#
SetEnvIF Request_URI "^/default.ida(.*)$" attacks nal
SetEnvIF Request_URI "^/wpad.dat(.*)$" attacks nal
SetEnvIF Request_URI "root\.exe(.*)$" attacks nal
SetEnvIf Request_URI "cmd\.exe(.*)$" attacks nal
SetEnvIf Request_URI "\.dll(.*)$" attacks nal
#
# Filter "Bad" Referer from access_logs
#
SetEnvIF Referer "badreferer.dom" badreferer nal
CustomLog logs/badreferer_log combined env=badreferer
CustomLog logs/attacks_log combined env=attacks
CustomLog logs/access_log combined env=!nal
I created a new environment variable nal, short for "no access log", and assign it to all SetEnvIf entries that I don't want logged in my access_log file. Then, modified the CustomLog entry, to not log any records that have set nal.
See the documentation, for more information on using mod_log_config [httpd.apache.org].
YMMV.
-teh