Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
I'm new to both this forum and perl & cgi scripts. I've been doing a lot of late night studing to learn it all quickly but it's just not possible to learn what I need to know in the short amount of time I have.
To make a long story short and to the point, I'm setting up my first web site and while searching for information about keeping nasty bots away from my site, I found this forum.
The information and knowledge I've come across in this thread is spectacular however, when I added the rewrite script I found here to my current .htaccess file I discovered that while it does a great job at keeping nasty bots away and provides me with an easy way to ban things not only by browser and name, but by IP address also, for some unknown reason every time I click the submit button on any of my online submition forms I get a 403 error message.
This is the only problem that adding rewrite rules appears to be causing. I went through a process of elimination by removing each rewrite rule one at a time until I only had the
RewriteEngine On
RewriteRule ^.* - [F,L]
portion left and I still kept getting a 403 error message every time I clicked on the submission button on any of my forms. Once I removed the remaining section of the rewrite script, my forms began functioning again.
Can any one help me out with this?
For your information when I first went into my .htaccess file I found the following content already in it which I'm aware was already brought up in this thread but I couldn't find any responce to the previous similar inquiry from veenerz...
# -FrontPage-
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName www.mydomainname.com
AuthUserFile /the /path/to/a/file.here
AuthGroupFile /and/the/path/to/another/file.here
The rewrite script was easy for me configure and use but this "mod_access" stuff with order deny, allow etc... I just can't understand or figure out.
Any and all assistance will be greatly appreciated.
Wolf
[webmasterworld.com...]
It monitors page requests and if a user requests too many within a set timeframe, they are given a custom 503 message.
Initially I used a long .htaccess file to prevent these programs however, it didnt always work and I always had to add USER_AGENTS to the file when new programs were released. This also doesnt protect against these programs when people change their USER_AGENT to IE or Netscape.
Once I placed this script on my site, I caught 8 different people (unique) over a 24 hour period trying to leech my site. They all had normal browser USER_AGENT settings so a .htaccess wouldnt help. Since my site is all PHP and mySQL generated, this copying really hit my server hard. Some were requesting up to 17 pages a second!
Now that they are caught in realtime, my server is performing much better and my regular visitors are very happy.
If you visit the thread notice a few changes I added to ensure Googlebot is exempted from the limits and can request as many pages as it wishes.
i no longer can find tha post. Can anyone sent me the url of the post on this forum? i like to check the code again.
I have spent a whole day on trying to find it with site search but i cant find it ;-(
i saw it few days back ...
it is php though, I am not sure which one you mean.
and Welcome to WebmasterWorld WolfHawk. :)
;-) i had read your message but i gues my mind were somewere els at the moment i wrote the reply and i had forgot you sorry
thx
i tryto make a code that blocks a bot on ip for multi conection to a site. but i dont wnat to use a saparated data file that works as counters ... i want to keep it in the code ...
You could use a single flat text file and load it into an array, but I havent tried that. That would also probabily use more CPU threads than writing seperate files per IP. Another possibility would be a mySQL database but you would have even more overhead with reads/writes to the database under heavy load.
I have been hit really hard with these programs since my site hosts over 20,000 images and movies. I have changed the line in the code to have 4,096 IP MD5 hashes vice the 256 the script has by default and have had no preformance problems.
I even customized the 305 page that is displayed. The page explains why they are viwing the message, (to prevent leeching, slow performance for regular visitors, etc) and even has a javascript count down time that starts at 60 and when it reaches 0, forwards them to the page/image/movie they origionaly requested.
As someone else stated in a previous post if I recall....This has been one hell of a read. I read every single post in one sitting tonight. To hell with books!
I just wanted to humorously/seriously note that all these attempts at getting rid of the bad bots/spammers etc.... they (the bad people) may all end up reading this post after they try a search on google to find out why their "system" isn't working any more, LOL. All the efforts you've all put into this post will possibly be read and this will help all the spammers. However the harder you make something to detour begginners, and robots that wouldn't read these forums anyway (unless robots get so smart that they can read forums), the better.
Maybe this forum should be encrypted in some arabic language that no one can read and only " good" people get the de-encryption software to read it. But to figure who who is bad and who is good, we are back to square one again...lol.
I find this forum possibly the most interesting and mind excersizing forum I've ever visited!
So you only need to put the .htaccess in your root directory. What if the robots enter from another area?
If it's entering, say, directly at www.example.com/deep/path/to/some/file.html the server will look for an .htaccess file in each directory, starting at the root, process any information it finds, before sending the page.
XBitHack on
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} iaea\.org [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "DTS Agent" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Fetch API Request" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Indy Library" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "LINKS ARoMATIZED" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "^DA \d\.\d+" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^Download" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^Internet Explore" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/4.0$" [OR] # dumb bot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\?\?$" [OR] # formmail attacker
RewriteCond %{HTTP_USER_AGENT} "compatible ; MSIE 6.0" [OR] # spambot (note extra space before semicolon)
RewriteCond %{HTTP_USER_AGENT} "efp@gmx\.net" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "mister pix" [NC,OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^Atomz [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/\d\.\d+ [OR] # OD
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlickBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [OR] # stupid user trying to edit my site
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^IE\ \d\.\d\ Compatible.*Browser$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR] # IE’s "make availableoffline" mode
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^NG [OR] # unknown bot
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR] # NameProtect spybot
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^PersonaPilot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sqworm [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} anarchie [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} crescent [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector¦magnet¦reaper¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} express [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} extractor [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} getright [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} go.?zilla [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} grabber [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} imagefetch [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} net.?(ants¦mechanic¦spider¦vampire¦zip)[NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} nicerspro [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ninja [NC,OR] # Download Ninja OD
RewriteCond %{HTTP_USER_AGENT} offline [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} snagger [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} tele(port¦soft) [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} vayala [OR] # dumb bot, doesn’t know how tofollow links, generates lots of 404s
RewriteCond %{HTTP_USER_AGENT} web.?(auto¦bandit¦collector¦copier¦devil¦downloader¦fetch¦hook¦mole¦miner¦mirror¦reaper¦sauger¦sucker¦site¦snake¦stripper¦weasel¦zip) [NC,OR] # ODs
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$" [OR] # Cyveillance spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [OR] # Turnitin spybot
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule!err_¦robots\.txt - [F,L]
[edited by: jatar_k at 4:36 pm (utc) on April 1, 2003]
[edit reason] sidescroll, had to shrink a line [/edit]
The meaning of the "F" and "L" flags was discussed (much) earlier in this post. The "!err_¦robots.\txt" means, that the redirection is valid for ALL files EXCEPT for files beginning with "err_" (in my cas those are my error documents like err_403.html) and robots.txt (in order to give a bot a chance to see where it is not wanted).
If there are more things that should be added please post them,
Thanks
The .htaccess file idea seems like a much better stop gap measure, and much more versatile, and easier to implement. I'll stop in now and then and see if there is anything more to add to it. Kudos to webmasterworld for having forums and contributors that actually can teach you something and not waste your time.
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
(just the referer ones with guestbook, and see if that makes a difference?), also cut out the first line source comment, just to be on the safe side, then see if you get the same errors.
I've been running it for a few days, without any errors, but that's just one server on one webhoster, so I can't tell you there's nothing wrong with it, maybe some of the other people who have contributed can take a look at it tech.ratmachines.com/downloads/sample_wbmw.txt
here and let us know.
Here are the first and last lines of the script, however, if someone can spot an error (the dots represent the cut out part:
===============================
RewriteEngine On
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
.....................
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg
RewriteRule ^.* - [F,L]
===============================
You might want to post what errors you got exactly, then somebody might be able to help you, I'm not very good at this stuff, but some of the people on this forum are.
Troubleshooting
Here are some of the most common problems I've seen people have (or have had myself) with .htaccess files. One thing I should stress first, though: the server error log is your friend. You should always consult the error log when things don't seem to be functioning correctly. If it doesn't say anything about your problem, try boosting the message detail by changing your LogLevel directive to debug. (Or adding a LogLevel debug line of you don't have a LogLevel already).
'Internal Server Error' page is displayed when a document is requested
This indicates a problem with your configuration. Check the Apache error log file for a more detailed explanation of what went wrong. You probably have used a directive that isn't allowed in .htaccess files, or have a directive with incorrect syntax.
.htaccess file doesn't seem to change anything
It's possible that the directory is within the scope of an AllowOverride None directive. Try putting a line of gibberish in the .htaccess file and force a reload of the page. If you still get the same page instead of an 'Internal Server Error' display, then this is probably the cause of the problem. Another slight possibility is that the document you're requesting isn't actually controlled by the .htaccess file you're editing; this can sometimes happen if you're accessing a document with a common name, such as index.html. If there's any chance of this, try changing the actual document and requesting it again to make sure you can see the change. this isn't happening.
I've added some security directives to my .htaccess file, but I'm not getting challenged for a username and password
The most common cause of this is having the .htaccess directives within the scope of a Satisfy Any directive. Explicitly disable this by adding a Satisfy All to the .htaccess file, and try again.
Don't use quotes for mod_rewrite patterns. That's for RedirectMatch syntax.
Comments should be on their own line - Otherwise, you will get warnings if you have that log-level set.
So,
RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] # spambot
should be
# spambot
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [OR]
RewriteRule !err_¦robots\.txt - [F,L]
the underscore in _"err_" needs to be escaped - precede it with a "\". RewriteRule !(err\_¦robots\.txt) - [F,L]
Also, the broken vertical pipe "¦" character above must be changed to a solid vertical pipe before it can be used in .htaccess.
HTH,
Jim
Bad bot script: [webmasterworld.com...]
(See the links at the top of that thread for even more "historical" information on the subject.)
Jim
Did I write this wrong?
Thanks
ladymindy
No problem blocking Teleport Pro. Except then I discovered it can be set to disguise itself as
(compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Hotbar 4.0) as well as a few other things.
Now I'm really confused. I can't very well block that right?!
Suggestions?
Ty