Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
Wouldn't it be easier to just write the .htaccess on what to allow, instead of what to deny?
As with the distiction of rights and liberties in basic human rights and with the question of your default policy in firewalls this is a question of what you want to prevail in new cases and in cases of doubt.
Monarchies, even constitutional ones, often are liberties based. Everything is forbidden (not only in the sense of criminal law) until allowed. E.g this was the case in the UK. This changed slightly with the passing of the Human Rights Act 1998. In countries with a rights based law systems everything is allowed until explicitly forbidden. Article 2 of the German contitutes that everybody can do as he pleases. The 10th Amendment to the US constitution states that all powers not delegated to the US or the states remain with the people.
If a new situation arises in a liberties based rights system, it is forbidden while it is allowed in a rights based one. The former is way more restrictive than the latter and not in accordance with the image of an autonomous, intelligent individual or prospective purposive agent.
The same goes for the question of whether you want to deny everybody and allow only certain user agents. It is always quite helpful to think about the consequences of an error in your system (Fehlerfolge).
Deny all: If you miss an UA (you don´t allow access) that UA will not be able to access your site.
Allow all: If you miss an UA (you don´t block it) that UA will be allowed.
Which consequence is the worst depends on your default policy. Would you rather have people see your content or not. In case of the former it is better to allow access to a UA that you don´t want to allow. In the latter case it is better to block more UAs than less.
If you care about freedom be permissive, if you are paranoid be restrictive.
Andreas
testing my htaccess I get the following error (excerpt):
You don't have permission to access /
on this server.<P>
<P>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request
although I specified
ErrorDocument 404 /err404.htm
ErrorDocument 403 /err403.htm
and this text above is not my err403. So UA xy also gets a 403 while getting the err page I think.
my last lines in htaccess:
RewriteCond %{HTTP_USER_AGENT} ^PortalBSpider
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.mydomain.net.* - [F]
any ideas? please help.
Yes, you have (with your rewrite) also blocked access to your custom 403 page, resulting in a cascade of 403 errors. It sounds like your server intervened to prevent an infinite loop.
Note also that the "http://(.*\.)?mydomain.net part of your URL is not "visible" to RewriteRules - they only "see" the path name part of the URI - the part of the URL after "mydomain.net/". Use RewriteConds if you need to test the domain.
I assume that you were testing the PortalBSpider rule... If so, here's a way to correct the problem:
RewriteCond %{HTTP_USER_AGENT} ^PortalBSpider
RewriteRule !^err403\.htm$ - [F,L]
This says, "redirect PortalBSpider requests for all files that are not named err403.htm"
-
You could also use:
RewriteRule !^err40[34]\.htm$ - [F,L]
to include allowing your err404 file, if you so desired.
I'm not absolutely sure about the path to your custom 403 page. If the above doesn't work, try
RewriteRule !^/err403\.htm$ - [F,L] or
RewriteRule !err403\.htm$ - [F,L]
You could also add a RewriteCond %{REQUEST_URI} !err403.htm$ instead of doing the err403.htm file exception in the RewriteRule itself, but since there are likely many RewriteCond's [OR]'ed together above the one you've shown, the and/or logic of the conditions is easy to confuse if you're not careful.
Clicking "Submit" and hoping there are no typos again!... Hope this helps,
Jim
RewriteRule /robots.txt$ - [NC,L]
Make sure that is the TOP of your http.conf container.
Anyway, I assume you could also add something like:
RewriteRule /path/to/403.html$ - [NC,L]
to precent a loop...
dave
So now we have three methods for getting around the problem shown here. :)
Jim
I don't really understand jow each different rule works and so want to be sure that my rules don't conflict with each other in any way
First Part
----------
ErrorDocument 404 /404.htm
ErrorDocument 400 /404.htm
ErrorDocument 403 /404.htm
ErrorDocument 501 /404.htm
ErrorDocument 502 /404.htm
ErrorDocument 503 /404.htm
2nd Part
--------
<FilesMatch "htm([l])*$">
ForceType application/x-httpd-php
</FilesMatch>
3rd Part
--------
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
I've never used the rewrite part (ie part 3 above) before and am worried that if I implement it it could interact badly with parts 1 and 2?
Thanks
Anni
As you have this set up now, your custom 403 page cannot be fetched by any banned user-agent, because 404.htm is also blocked. I would recommend creating a separate custom 403 page called 404.html, changing your ErrorDocument directive to
ErrorDocument 403 /403.htm
and changing the last 5 lines of your rewrite ruleset to:
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org
RewriteRule !^403.htm$ - [F,L]
This blocks the user-agents and referers you have listed from accessing you site, except that it allows them to fetch 403.htm when redirected there by the RewriteRule and your ErrorDocument directive.
Also, I don't really see any point in having a custom 400 error document, since malformed requests are unlikely to come from anything or anyone that you want to help (robots won't follow a 400-series error redirect anyway).
I think you can write the FilesMatch pattern like this:
<FilesMatch "html?$">
which will work better, unless you want to accept requests for filetypes of .htmll, htmlll, htmllll, etc. The question mark just means "match 0 or 1 of the preceding character" - in your case, match "htm" or "html".
Other than the above, your .htaccess looks fine to me.
Jim
ErrorDocument 401 /error.php?eid=401
ErrorDocument 403 /error.php?eid=403
ErrorDocument 404 /error.php?eid=404
ErrorDocument 500 /error.php?eid=500
< i have a whole bunch of RedirectPermanent lines here >
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?macombsheriff.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(dev\.)?macombsheriff.com.*$ [NC]
RewriteRule \.(gif¦jpg¦zip¦pdf)$ http://www.runningwolf.com/dev/apology.gif [R,L]
# RewriteCond %{HTTP_USER_AGENT} ^Mozilla* [OR]
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CICC [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} email [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} grab [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} offline [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]
I think it would be beneficial to this forum to eliminate the long lists of User-agents in posts - just include the first two and last two, for example, when discussing structural issues. If the subject returns to a comprehensive listing we can include them all, but for structural issues, it just wastes space and makes the thread hard to follow...
Including these long list of User-agents is beginning to approach a code review, and we're not supposed to do those here. Also, we're supposed to refrain from including valid URLs for our sites.
Please don't get me wrong - I just don't want to see this thread - which I have found to be quite useful - closed due to violations of the WebmasterWorld TOS.
Thanks,
Jim