Forum Moderators: phranque

Message Too Old, No Replies

500 Internal Error with SetEnvIf User-Agent

         

aodonline

3:37 pm on Dec 28, 2003 (gmt 0)

10+ Year Member



I have been reading various threads on blocking known bad bots. I though I had a good list going here. But I did something wrong since I get a 500 error when accessing my website.

I created a .htaccess file with several lines of SetEnvIf User-Agent ^botname bad_bot
and then this at the end of file.
<Limit GET POST PUT HEAD>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>

The above results in the following error message being generated to my error log.

[Sun Dec 28 09:23:03 2003] [alert] [client xx.xx.xx.xx] /home/virtual/site****/fst/var/www/html/.htaccess: SetEnvIf regex could not be compiled.

Any suggestions?

jdMorgan

7:25 pm on Dec 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



aodonline,

> I created a .htaccess file with several lines of SetEnvIf User-Agent ^botname bad_bot

As stated in the error log, the problem is a regular-expressions syntax error in one of your SetEnvIf directives, which you did not post.

I'd suggest a review of the regular-expression syntax of each line. If you don't spot anything, you can comment out half of these lines by adding a "#" to the beginning, and see if the problem goes away. If so, un-comment half of the lines that were previously commented-out and test again. Using this "divide and conquer" approach, you can zero in on the problem in a fairly efficient way; Since you're using a binary approach, you could narrow down a list of 32,768 lines to one problem line in 15 trials. Just keep careful track of the 'one-half' groups that you're working with as you go.

Ref: Regular-expressions tutorial [etext.lib.virginia.edu]

Jim

aodonline

2:36 am on Dec 29, 2003 (gmt 0)

10+ Year Member



Thanks I actualy though about that. But wasn't sure of the actual process for commenting out lines.

I'm actualy wondering though if my host has set some sort of line or file size limit for the .htaccess

Is that possible. They use the Ensim CP.

aodonline

3:46 am on Dec 29, 2003 (gmt 0)

10+ Year Member



ok, i'm no longer getting 500 errors thanks to jd's suggestions.

But i have a question.

I have been testing my .htaccess file with [wannabrowser.com...]
(By the way thanks to who ever created it)

It seems that some entrys like Wget are still able to get past my .htaccess file.

My coresonding line in the .htaccess file is:
RewriteCond %{HTTP_USER_AGENT} Wget [NC,OR]

I have tried it with and without the ^ before wget.

On wanna's page from 2 different machines I still get a status code of 200 and the html page.

[edit]Opps forgot that I added block.php from somewhere on this form until I got my .htaccess working since the HTML output is the error page generated by it. Hummm I would have though .htaccess would over ride a php script?[/edit][edit2]But even after removing it I'm now getting 200 errors for every thing in my .htaccess. Man what is going on.[/edit2]

aodonline

4:20 am on Dec 29, 2003 (gmt 0)

10+ Year Member



Here is my htaccess file contents.

Options FollowSymLinks

order allow,deny
allow from all

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Exalead\ NG\/MimeLive\ Client [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Fetch\ API\ Request [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
# first block
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^http://www.almaden.ibm.com/cs/crawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\.Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
# Second Block
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NutchOrg [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
# Third Block
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Pompos [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^QPCreep\ Test\ Rig [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck\.internetseer\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [NC,OR]
# Fourth Block
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
# Fifth Block
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

aodonline

11:09 pm on Dec 29, 2003 (gmt 0)

10+ Year Member



Sorry I forgot to mention that I switched directives after failing to figure out what my problem was with the inital .htaccess file.

I have now figured out why I was not getting 403 errors like I should have.

I placed

order allow,deny
allow from all

I the wrong spot. I had to place it after your mod_rewrite, or they seem to be ignored.

aodonline

6:49 am on Jan 1, 2004 (gmt 0)

10+ Year Member



Ok, my .htaccess was working perfectly.

Now I get a 403 access denied for my whole site. I don't understand what I'm doing wrong.

My temporary php solution works for any user agents I have set in it.

The php solution is from:
//By: Christopher Lover - webmaster at icehousedesigns dot com
//http://www.icehousedesigns.com
I found it on one of the script sites.

But my .htaccess file refuses to work. Either I can totaly by pass it (unless I'm calling a old page and I have a redirect for it) or I get a 403 access denied from an allowed user agent.

For the time being I have my mod_rewrite directives commented out.

I am totaly mif'd. I have tried it on two different domains in my reseller account. Both have the same result.

Can another pair of eyes tell me what I'm missing or doing wrong.

Here is my .htaccess file


Options +FollowSymLinks

#Im sure limit directive goes here but not sure.
#the allow from deny from statement. Do I need it

#RewriteEngine On
#RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Exalead\ NG\/MimeLive\ Client [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} Fetch\ API\ Request [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^http://www.almaden.ibm.com/cs/crawler [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} Indy\.Library [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^NutchOrg [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Pompos [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^QPCreep\ Test\ Rig [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^sitecheck\.internetseer\.com [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Szukacz [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth[NC,OR]
#RewriteCond %{HTTP_USER_AGENT} ^Zeus
#RewriteRule ^.* - [F,L]

Redirect 301 /index.html [mydomain.com...]
Redirect 301 /html/hosting.html [mydomain.com...]
Redirect 301 /html/design.html [mydomain.com...]
Redirect 301 /html/contactform.html [mydomain.com...]
Redirect 301 /html/aup.html [helpdesk.mydomain.com...]
Redirect 301 /html/disputepolicy.html [helpdesk.mydomain.com...]
Redirect 301 /html/dmca.html [helpdesk.mydomain.com...]
Redirect 301 /html/gurantee.html [helpdesk.mydomain.com...]
Redirect 301 /html/infoform.html [mydomain.com...]
Redirect 301 /html/ip_policy.html [helpdesk.mydomain.com...]
Redirect 301 /html/nof_resources.html [mydomain.com...]
Redirect 301 /html/order2.html [mydomain.com...]
Redirect 301 /html/order.html [mydomain.com...]
Redirect 301 /html/orderb.html [mydomain.com...]
Redirect 301 /html/privacy.html [helpdesk.mydomain.com...]
Redirect 301 /html/registrationagreement.html [helpdesk.mydomain.com...]
Redirect 301 /html/resources.html [mydomain.com...]
Redirect 301 /html/scripts.html [mydomain.com...]
Redirect 301 /html/search.html [helpdesk.mydomain.com...]
Redirect 301 /html/support.html [mydomain.com...]
Redirect 301 /html/thankyou1.html [mydomain.com...]
Redirect 301 /html/thankyou.html [mydomain.com...]
Redirect 301 /html/tos.html [helpdesk.mydomain.com...]
Redirect 301 /html/upgrade.html [mydomain.com...]

jdMorgan

9:29 am on Jan 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Missing space ahead of flags:

#RewriteCond %{HTTP_USER_AGENT} ^Xenu\ Link\ Sleuth[NC,OR]

Maybe that's not it, but generally, no typos allowed in mod_rewrite!

Jim

aodonline

4:38 pm on Jan 1, 2004 (gmt 0)

10+ Year Member



You know how many times I've been over that file looking for a missing "\" to exscape a space, or a missing space in between } and ^, or a line missing the flags.

Man I wonder how hard it would be to write a simple checker for .htaccess files. Some thing simple that you dump you contents into and it looks for un-escaped spaces, for missing spaces and that's its.

Any way thank you again jd.

This noob is going to learn one way or another how to use .htaccess files.