Forum Moderators: phranque
.htacces contents:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?myspace\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?blogspot\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?livejournal\.com/ [NC]
RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ - [F]
RewriteCond %{HTTP_USER_AGENT} ^.*Backweb.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*gotit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bandit.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Ants.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Buddy.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Crawler.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grabber.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Sucker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Downloader.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Siphon.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Collector.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snagger.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Widow.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Snake.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Vacuum.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pump.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Reaper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mag-Net.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Memo.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*pcBrowser.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperBot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*leech.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Stripper.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Copier.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mirror.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HMView.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JOC.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*likse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Recorder.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Likse.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Navroad.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*attach.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Magnet.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Surfbot.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Whacker.*$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FileHound.*$
RewriteRule /* [mydomain.com...] [L,R]
If so, is there anything in your server error log?
The reason I ask is that the first part of your code should work as long as you have changed the broken pipe "¦" characters to solid pipes -- Posting on this forum modifies those characters.
The second part won't work because it creates an infinite loop: The URL you are redirecting to will match the rule's pattern and get rewritten again and again, until either the server or the client reaches its maximum redirection limit.
Assuming that you are not using a custom 403 error page, I suggest you replace that rule with
RewriteRule !^robots\.txt$ - [F]
RewriteCond $1 !^robots\.txt$
RewriteCond $1 !^path_to_custom_403_error_page.html$
RewriteRule (.*) - [F]
RewriteEngine on
RewriteRule ^foo\.html$ http://www.WebmasterWorld.com/ [R=301,L]
Also, it appears that you've edited most of the RewriteCond patterns, and added unnecessary ".*" patterns to the beginning and end of these patterns. This slows down your server, while changing the pattern in a way that won't change the practical behaviour, won't have any effect at all, or worst of all, will block legitimate visitors.
For a quick regular-expressions review:
^match-must-start-with-this-string
match-must-end-with-this-string$
^match-exactly-this-string$
match-must-contain-this-string
It's easy to see that since ".*" matches anything at all --including a blank string-- that the following two patterns are functionally identical:
^.*foo.*$
foo
By adding those extraneous ".*" subpatterns and altering the pattern anchors, you've done two things: Slowed down the server and/or made the patterns less specific. In some cases, these less-specific patters can be dangerous, in that they'll block access by legitimate visitors as well as the unwelcome ones. I suggest reviewing the lists of blocked user-agents in the other threads here, and paying careful attention to the "starts-with" versus "ends-with" versus "exactly-matches" versus "contains" notations. In *most* cases, the authors were careful to properly-anchor each of the patterns for the specific user-agent to be matched.
For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].
Jim
During this I noticed that the default behavior of Teleport Pro now seems to be to impersonate IE 5.0 instead of itself, so it probably won't do a lot of good in the future. For now the ones that are causing the most problems are using a correctly-identified agent, so that's a plus.
Thanks again for the help sorting that out. Those resources you mentioned are great and hopefully I should be able to sort out future problems on my own.
Jim