Forum Moderators: phranque
We keep getting visited by the webreaper spider, and I'm looking for a way to keep it out. I found the following possible solution on a web page that suggests an .htaccess file in the default directory:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Webreaper
RewriteRule ^.*$ /lists/ [F,L]
Does anyone know if this the best way to accomplishing this? Or is there a better solution? I just want to make sure that I don't do anything that somehow inadvertently keeps out the search engine spiders I want to visit such as googlebot, Yahoo, etc.
Also, if I already have an .htaccess file that already has an existing condition:
RewriteEngine on
RewriteRule ^pagelink/(.+)/ /cgi-local/runner.cgi?p=linker&ID=$1 [L]
Would I add the new condition for keeping out webreaper below it again in its entirety with starting again with the line
RewriteEngine On
or should I not repeat this line because it was turned on with the first condition?
Thanks in advance for any help/suggestions.
Try something like this:
RewriteCond %{HTTP_USER_AGENT} ^Webreaper [NC]
RewriteRule .* [F]
RewriteCond %{HTTP_USER_AGENT} ^Webreaper [NC]
RewriteRule !^path_to_custom_error_document$ [F]
You do not need to repeat the RewriteEngine on directive within any given .htaccess file.
Refs:
Apache mod_rewrite documentation [httpd.apache.org]
Apache URL Rewriting Guide [httpd.apache.org]
Regular Expressions Tutorial [etext.lib.virginia.edu]
A Close to perfect .htaccess ban list [webmasterworld.com] (In three parts)
Jim
RewriteRule!^path_to_custom_error_document$ [F]
If the .htaccess file is in the root directory, then when I make the path to the custom error document I shouldn't start with a slash, is that right? i.e.
RewriteRule!^directory_name/error_document.html$ [F]
rather than:
RewriteRule!^/directory_name/error_document.html$ [F]
You can also use the RewriteBase directive to avoid this inconsistency if you like.
Jim
I have several rewrites in place such as:
RewriteCond %{HTTP_REFERER} ^http://www.example.com/* [OR]
RewriteCond %{REQUEST_URI} FormMail.*
RewriteRule ^.* - [F,L]
A long list of them I also have an custom error doc in place with these in the .htaccess
ErrorDocument 400 /errors/phpErrorDoc.php?400
ErrorDocument 401 /errors/phpErrorDoc.php?401
ErrorDocument 403 /errors/phpErrorDoc.php?403
ErrorDocument 404 /errors/phpErrorDoc.php?404
ErrorDocument 500 /errors/phpErrorDoc.php?500
I'm wondering after your post if I need to add something to prevent loops. I don't even know if I have loops, although I have had to add a script which checks server laod and restarts when above 5, which can be once or twice a day.
Anyone?
[edited by: jdMorgan at 2:08 pm (utc) on July 23, 2004]
[edit reason] examplified URL per TOS [/edit]
Now, just so I understand this, I'm a newbie to this.
What is actualy happening now?
Could you map that out?
RewriteCond %{HTTP_REFERER} ^http://www.example.com/* [OR]
RewriteCond %{REQUEST_URI} FormMail.*
RewriteRule!^errors/phpErrorDoc\.php$ - [F]
[edited by: jdMorgan at 2:10 pm (utc) on July 23, 2004]
[edit reason] examplified URL per TOS [/edit]
RewriteCond %{HTTP_REFERER} ^http://www\.example\.com [OR]
RewriteCond %{REQUEST_URI} FormMail
RewriteRule !^errors/phpErrorDoc\.php$ - [F]
(Also, corrected the special character escaping in the referrer and removed a bit of unnecessary fluff from the regex code).
Apache mod_rewrite documentation [httpd.apache.org]
Apache URL Rewriting Guide [httpd.apache.org]
Regular Expressions Tutorial [etext.lib.virginia.edu]
Jim