Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite check.

         

kahuna

2:43 pm on Dec 3, 2005 (gmt 0)

10+ Year Member



Hi.. I should have this correct.. but I am trying to figure out why a particular IP.. or domain.. is slamming my site...
For example last night was 40megs in the logs from the specific ip that is getting redirected below.

This is what I have..

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} MS.*FrontPage [NC]
RewriteRule!^403.*\.html$ - [F]

RewriteCond %{REMOTE_HOST} \.example\.com$
RewriteRule .* [mydomain.com...] [R=301,L]

<Limit GET>
order allow,deny
deny from #*$!.xxxx.xxx.xxx
deny from yyy.yyy.yyy.yyy
allow from all
</Limit>

thanks again group

kahuna

3:23 pm on Dec 3, 2005 (gmt 0)

10+ Year Member



in the above eample the extra characters
deny from #*$!

some how or another got tossed in during cut and paste

jdMorgan

1:12 am on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"xxx" is a stop word at WebmasterWorld, for obvious reasons. Use "***" instead. :)

I wouldn't bother redirecting \.example.com to "banned.shtml". Bad bots don't follow redirects, as a rule. Just 403 them like your first code block does.

Also, be aware that testing REMOTE_HOST and using a URL-path pattern of .* will have potentially-serious performance impact on your server. This is because you are asking your server to do a reverse-DNS lookup on every single resource requested from your server. If possible, look up the IP address range associated with \.example.com, and block/redirect them by IP address %{REMOTE_ADDR} instead of by hostname.

Are you saying that you are still getting slammed by one of the IP addresses in your "deny from" code at the bottom? If so, are you sure they are doing "GET" requests? You have set up your code so that only GET requests are restricted.

Jim

kahuna

12:22 pm on Dec 4, 2005 (gmt 0)

10+ Year Member



I put this in the my example
-------------------------
<Limit GET>
order allow,deny
deny from 111.111.111.111
deny from yyy.yyy.yyy.yyy
allow from all
</Limit>
-------------------------

Just to show that it is in my htaccess file.

Maybe I could ban IP address... but I think it would be exhaustive as if you were to try to ban say, aol users from Dallas Texas

The situation is occurring thru my mod-rewrite "banning."
the host/persons I am banning is something like this
\.the\.group\.i\.ban\.example\.com thus it's less global than my \.example\.com

My concern was that maybe I didn't include a closing argument in my mod rewrite and the process was running some sort of endless loop. And as there are not any suggestions pertaining to that idea from the example I posted... I am guessing that is not the problem (hopefully) .

I had a similar problem
[webmasterworld.com...] a year ago
ending up totaling 1 million hits .
In that case the individual was using the search engine MSN and their host was a big name cable company that uses google to do their search engine (ie.. you go to big name company's website and they have a search box at the top of the page that gives google results). So.. I don't believe it was a bad bot from google but a maybe somebody trying to obfuscate the search engine or a script kiddie trying to break into my hosting company's servers and my website.

This time it is a similar situation, but they are using google... that is in my log the referer looks like this
yahdaha yahdah yahdah "http://www.google.com/search?hl=en&q=bozotheclown+big-shoes" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; AT&T WNS5.0; YComp 5.0.0.0; .NET CLR 1.1.4322)"

and that will result in big chunk of my log.

So obviously that's not a bot .
It "feels" like a script or somebody taping a fishing weight down on the F5 refresh key .

So...... if my mod rewrite is good , then it is not something I have generated by mistake in the htaccess file.

Thanks so much again for your comments and time on this issue, as I expect I will not get any quantitative explanations from the offending company, g, or my host. I didn't last year with the similar incident and MSN and big name cable company.

Thanks again.

jdMorgan

4:54 pm on Dec 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just trying to nail down what section of the code your having trouble with, and what problem you're having...

The following code will loop, because there is no provision to allow 'banned.shtml' to be served, so after you do the redirect and the client returns asking for 'banned.shtml', it will get redirected again.

There are several ways to fix this, and also several ways to improve it. I'll show them cumulatively, but you can mix and match the techniques if desired:


# Prevent 'infinite' redirect loop
RewriteCond %{REMOTE_HOST} \.example\.com
RewriteRule !^banned\.shtml$ http://www.mydomain.com/banned.shtml [R=301,L]


# Hide the banning function from the client by using internal rewrite instead of redirect
RewriteCond %{REMOTE_HOST} \.example\.com
RewriteRule !^banned\.shtml$ /banned.shtml [L]


# Reduce horrible DNS lookup load on server by limiting check to only shtml pages
RewriteCond %{REQUEST_URI} !^/banned\.shtml$
RewriteCond %{REMOTE_HOST} \.example\.com
RewriteRule \.shtml$ /banned.shtml [L]


# 403 unwelcome hosts instead of serving banned.shtml
RewriteCond %{REMOTE_HOST} \.example\.com
RewriteRule \.shtml$ - [F]

In the last two cases, I show limiting the scope of the reverse-DNS check, so that it only happens on .shtml page requests. You could adapt this so that it only checks for .php page requests, or request for any type of files but only in a certain subdirectory -- whatever makes sense for your site. Certainly if these 'bot-scripts never request images, you'd want to exclude image files from the RDNS check.

In addition to increasing the server load by a factor of at least two, checking reverse DNS also incurs an additional dependency: If the DNS server you're querying is slow or broken, then your site will be slow or broken. As it is now, every incoming request to your server for every page, image, stylesheet, etc. results in an outgoing RNDS lookup request, and the incoming request cannot be served until the RDNS response comes back. You may indeed have to live with this for awhile, but at least be aware of the severity of the load increase, so you can balance it with your access control needs.

Also, be aware that if you use a custom 403 error page, then it will also need to be excluded from access control in a similar way to that shown for excluding banned.shtml, in order to avoid a looping situation.

HTH,
Jim

kahuna

7:55 pm on Dec 4, 2005 (gmt 0)

10+ Year Member



Thanks as always... but what a bummer. I don't know why I hadn't seen the issue before with other REM0TE_H0ST redirects , and or if it was random from the servers which wouldn't make sense to me.. the infinite loop problem .

I would often check the logs to see if my redirects were taking place... so I don't understand how I would have missed other endless loops taking place.

And yes... HTH.
thanks again.

kahuna

10:59 am on Dec 5, 2005 (gmt 0)

10+ Year Member



the reason why dummie (me) "missed" or the endless loops didn't occur in many instances.. is because most redirects started in sub directories and were redirected to upper level directories where.
When I used similar coding from the lower level directory in my root html directory.. duhhhhh it causes the loops.
When JD writes his book "Htaccess Files for Dummies" I guess I'll be getting one. thanks again.