homepage Welcome to WebmasterWorld Guest from 54.234.2.88
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
Blocking user agents
Problems with trying to ban user agents from accesing my site
kidd




msg:910188
 5:34 pm on Nov 27, 2002 (gmt 0)

Hello:

Since a few weeks ago I noticed that the hits on my site incremented almost 100% but the visits where almost the same...So I looked into the logs and I found out that I've been getting visits from harvesters and web crawlers(no robots).

So I came here to look for a way to block them from the site and I found this post: SetEnvIf User-Agent [webmasterworld.com].

And there I saw this piece of code for the .htacces that does exactly what Im trying to do:


setenvif HTTP_REFERER ^http://www.iaea.org getout
setenvif User-Agent ^attach getout
setenvif User-Agent ^Au2Email getout
setenvif User-Agent "^Advanced Email Extractor" getout
setenvif User-Agent ^BackWeb getout
setenvif User-Agent ^Bandit getout
setenvif User-Agent ^BatchFTP getout
setenvif User-Agent ^BaySpider getout
setenvif User-Agent ^BlackWidow getout
setenvIf User-Agent ^Bot.mailto:craftbot@yahoo\.com getout
setenvif User-Agent ^Buddy getout
setenvif User-Agent ^ChinaClaw getout
setenvif User-Agent ^Collector getout
setenvif User-Agent ^Copier getout
setenvif User-Agent ^Crescent getout
setenvif User-Agent ^curl getout
setenvif User-Agent ^DA getout
setenvif User-Agent "^DISCo Pump" getout
setenvif User-Agent "^Download Demon" getout
setenvif User-Agent "^Download Wonder" getout
setenvif User-Agent ^Downloader getout
setenvif User-Agent ^Drip getout
setenvif User-Agent ^eCatch getout
setenvif User-Agent ^e-collector getout
setenvif User-Agent ^EirGrabber getout
setenvif User-Agent ^EmailCollect getout
setenvif User-Agent ^EmailHarvest getout
setenvif User-Agent ^EmailMagnet getout
setenvif User-Agent ^EmailReaper getout
setenvif User-Agent ^EmailSiphon getout
setenvif User-Agent "^Email Spider" getout
setenvif User-Agent "^EmailWolf" getout
setenvif User-Agent "^Express WebPictures" getout
setenvif User-Agent ^ExtractorPro getout
setenvif User-Agent ^EyeNetIE getout
setenvif User-Agent ^FileHound getout
setenvif User-Agent ^Floodgate getout
setenvif User-Agent ^FlashGet getout
setenvif User-Agent ^****ybot getout
setenvif User-Agent ^GetRight getout
setenvif User-Agent ^GetSmart getout
setenvif User-Agent ^Go!Zilla getout
setenvif User-Agent ^Go-Ahead-Got-It getout
setenvif User-Agent ^gotit getout
setenvif User-Agent ^Grabber getout
setenvif User-Agent ^GrabNet getout
setenvif User-Agent ^Grafula getout
setenvif User-Agent ^HMView getout
setenvif User-Agent ^HTTrack getout
setenvif User-Agent ^InterGET getout
setenvif User-Agent "^Internet Ninja" getout
setenvif User-Agent ^Iria getout
setenvif User-Agent ^JetCar getout
setenvif User-Agent ^JOC getout
setenvif User-Agent ^JustView getout
setenvif User-Agent "^Kontiki Client" getout
setenvif User-Agent ^larbin getout
setenvif User-Agent ^Linkidator getout
setenvif User-Agent ^LeechFTP getout
setenvif User-Agent ^lftp getout
setenvif User-Agent ^likse getout
setenvif User-Agent ^Magnet getout
setenvif User-Agent ^Mag-Net getout
setenvif User-Agent "^Mail Harvester" getout
setenvif User-Agent "^Mass Downloader" getout
setenvif User-Agent ^Memo getout
setenvif User-Agent "^MIDown tool" getout
setenvif User-Agent "^Microsoft URL Control" getout
setenvif User-Agent ^Mirror getout
setenvif User-Agent "^Mister PiX" getout
setenvif User-Agent ^Navroad getout
setenvif User-Agent ^NearSite getout
setenvif User-Agent ^NetAnts getout
setenvif User-Agent ^NetSpider getout
setenvif User-Agent "^Net Vampire" getout
setenvif User-Agent ^NetZip getout
setenvif User-Agent ^Ninja getout
setenvif User-Agent ^Octopus getout
setenvif User-Agent "^Offline Explorer" getout
setenvif User-Agent ^PageGrabber getout
setenvif User-Agent "^Papa Foto" getout
setenvif User-Agent ^pcBrowser getout
setenvif User-Agent "^Pictures Grabber" getout
setenvif User-Agent ^Pockey getout
setenvif User-Agent ^psbot getout
setenvif User-Agent ^Pump getout
setenvif User-Agent ^RealDownload getout
setenvif User-Agent ^Reaper getout
setenvif User-Agent ^Recorder getout
setenvif User-Agent ^ReGet getout
setenvif User-Agent "^Road Runner: ImageScape Robot" getout
setenvif User-Agent ^Siphon getout
setenvif User-Agent ^SiteSnagger getout
setenvif User-Agent ^SlySearch getout
setenvif User-Agent ^SmartDownload getout
setenvif User-Agent ^Snake getout
setenvif User-Agent ^Stripper getout
setenvif User-Agent ^Sucker getout
setenvif User-Agent ^SuperBot getout
setenvif User-Agent ^SuperHTTP getout
setenvif User-Agent ^Surfbot getout
setenvif User-Agent "^Sqworm/2.9.85-BETA" getout
setenvif User-Agent ^tAkeOut getout
setenvif User-Agent ^Tcl_http_client_package getout
setenvif User-Agent "^Teleport Pro" getout
setenvif User-Agent ^Telesoft getout
setenvif User-Agent ^TurnitinBot getout
setenvif User-Agent ^URLBlaze getout
setenvif User-Agent ^Vacuum getout
setenvif User-Agent ^VobSub getout
setenvif User-Agent ^VoidEYE getout
setenvif User-Agent "^Web Image Collector" getout
setenvif User-Agent "^Web Sucker" getout
setenvif User-Agent ^WebAuto getout
setenvif User-Agent ^WebBandit getout
setenvif User-Agent ^WebCopier getout
setenvif User-Agent "^Web Downloader" getout
setenvif User-Agent ^WebEMailExtrac getout
setenvif User-Agent ^WebFetch getout
setenvif User-Agent ^WebMole getout
setenvif User-Agent ^WebMiner getout
setenvif User-Agent ^WebReaper getout
setenvif User-Agent ^WebSauger getout
setenvif User-Agent ^WebSnake getout
setenvif User-Agent ^Website getout
setenvif User-Agent ^Webster getout
setenvif User-Agent ^WebStripper getout
setenvif User-Agent ^WebWeasel getout
setenvif User-Agent ^WebWhacker getout
setenvif User-Agent ^WebZIP getout
setenvif User-Agent ^Wget getout
setenvif User-Agent ^Whacker getout
setenvif User-Agent ^Widow getout
setenvif User-Agent ^wysiwyg getout
setenvif User-Agent ^Xaldon getout
setenvif User-Agent ^Zeus getout
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=getout
</Limit>

It says that is for Mod_Rewrite impaired servers.

But I didn't notice it and I dont know if Im one of those. So I rewrote my .htaccess but received a 500 error.

Could someone tell me how to get this to work on my server? Right now I know that Im working on:

Operating system: Linux
Apache version: 1.3.24 (Unix)

 

seindal




msg:910189
 6:31 pm on Nov 27, 2002 (gmt 0)

You probably get some kind of error message from apache.

What does it say if you run "apache -t" from a command line, as root.

jdMorgan




msg:910190
 7:33 pm on Nov 27, 2002 (gmt 0)

kidd,

Your Apache server is up-to-date, and should support everything you need. However, for mod_rewrite to work, you need to have that module loaded on the server. In addition, you need the privelege to use it. Ask your hosting service or sysadmin to check on that. You can look up AllowOverride in the on-line Apache Server documentation [httpd.apache.org] for more background.

Your site's raw error log should show an error entry when you get that 500-Server Error - What does it say?

Just looking at what you've got, I'd recommend surrounding all user-agent strings which contain spaces or any other special characters, such as colon, slash, period, etc. with double quotes. A space or any other character that has special meaning to SetEnvIf will cause a 500 error if the string's not delimited by quotes. Also, you might lowercase the environment variable "HTTP_REFERER" in the first line, though I doubt that's the problem.

You could also go through all the SetEnvIf lines and comment them out with a "#", then uncomment them one-at-a-time or in groups to find the ones causing problems. Start small and build up, in other words.

If this doesn't help, please check your raw error log and re-post with more information about what works and what doesn't work in your .htaccess file. There are many, many variations in what resources and priveleges hosting companies grant their customers.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved