Forum Moderators: phranque

Message Too Old, No Replies

Rewrite rule in htaccess makes every request to my cgi-bin = 403

         

Joe Belmaati

9:27 am on Oct 7, 2004 (gmt 0)

10+ Year Member



I have employed many of the rewrite tricks to deal with email harvesters and other menaces. However, as soon as I apply rewrite rules in my root-dir htaccess file, I get a 403 when any script is called in my cgi-bin. Any ideas..?
Sincerely,
Joe Belmaati
Copenhagen Denmark

jdMorgan

7:20 pm on Oct 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Joe,

It's impossible to determine what might be the problem without further details. You might consider posting a sample of your RewriteRules, and pointing out any that might be involved with cgi-bin requests.

Also, you might want to try a trivial rewrite, after removing all other rewrites, and see if that affects cgi-bin requests.

Something like


Options +FollowSymLinks
RewriteEngine on
RewriteRule ^myfile\.html$ /index.html [L]

Where index.html is a file that exists, and myfile.html is a file that does not exist. If you request myfile.html with your browser, you should receive the contents of index.html. Very often, a test using a simple example can reveal where the problem might be.

Jim

Joe Belmaati

7:40 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



Hi Jim,
I have tried cvery simple rewrite rules and as soon as I turn the rewrite engine on, the cgi-bin returns 403. Interestingly, I've got the rewrite engine on in a sub-directory with no problems.

Anyhow, here's my htaccess file

SetEnvIf Request_URI "^(/403.*\.htm¦/robots\.txt)$" allowsome

<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>

<Files ~ "^\.ht">
Order allow,deny
Deny from all
Satisfy All
</Files>

RewriteEngine on
RewriteBase /
# Various bots
RewriteCond %{HTTP_USER_AGENT} ^WinHttp\.WinHttpRequest\.\d+ [NC,OR]
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent¦Email.?Extrac) [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR]
# Download managers
RewriteCond %{HTTP_USER_AGENT} ^(Alligator¦DA.?[0-9]¦DC\-Sakura¦Download.?(Demon¦Express¦Master¦Wonder)¦FileHound) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Flash¦Leech)Get [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Fresh¦Lightning¦Mass¦Real¦Smart¦Speed¦Star).?Download(er)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy¦Go!Zilla¦iGetter¦JetCar¦Net(Ants¦Pumper)¦SiteSnagger¦Teleport.?Pro¦WebReaper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR]
# Image-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot¦FlickBot¦webcollage) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Express¦Mister¦Web).?(Web¦Pix¦Image).?(Pictures¦Collector)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch¦Stripper¦Sucker) [NC,OR]
# "Gray-hats"
RewriteCond %{HTTP_USER_AGENT} ^(Atomz¦BlackWidow¦BlogBot¦EasyDL¦Marketwave¦Sqworm¦SurveyBot¦Webclipping\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (girafa\.com¦gossamer\-threads\.com¦grub\-client¦Netcraft¦Nutch) [NC,OR]
# Site-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(eCatch¦(Get¦Super)Bot¦Kapere¦HTTrack¦JOC¦Offline¦UtilMind¦Xaldon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto¦Cop¦dup¦Fetch¦Filter¦Gather¦Go¦Leach¦Mine¦Mirror¦Pix¦QL¦RACE¦Sauger) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor¦Quester)¦Snake¦ster¦Strip¦Suck¦vac¦walk¦Whacker¦ZIP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR]
# Tools
RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FrontPage¦Indy.?Library¦RPT\-HTTPClient) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww¦lwp¦PHP¦Python¦www\.thatrobotsite\.com¦webbandit¦Wget¦Zeus) [NC,OR]
# Unknown
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application¦Lachesis¦Nutscrape) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse¦Eval¦Surf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo¦Full.?Web¦Lite¦Production¦Franklin¦Missauga¦Missigua).?(Bot¦Locat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net¦hhjhj@yahoo\.com¦lerly\.net¦mapfeatures\.net¦metacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry¦Internet¦IUFW¦Lincoln¦Missouri¦Program).?(Program¦Explore¦Web¦State¦College¦Shareware) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac¦Ram¦Educate¦WEP).?(Finder¦Search) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC,OR]
# Frontpage Office etc
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR]
#RewriteCond .* - [F]
# Email
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
# Forbid if UA is a single word - case-insensitive, A-Z only
RewriteCond %{HTTP_USER_AGENT} ^[a-z]+$ [NC,OR]
# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_USER_AGENT} ^-?$

RewriteRule /*$ /getout.php [L,R]

Joe Belmaati

7:41 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



BTW - commenting out this guy:

RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR]

doesn't remedy the problem....

jdMorgan

9:16 pm on Oct 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This last bit is malformed, and does not do what is says it does. Among other things, you will be recording the IP address of each new request from AOL users, and blocking them.

RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
# Forbid if UA is a single word - case-insensitive, A-Z only
RewriteCond %{HTTP_USER_AGENT} ^[a-z]+$ [NC,OR]
# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_USER_AGENT} ^-?$
#
RewriteRule /*$ /getout.php [L,R]

Instead, try replacing the above lines with this:

RewriteCond %{THE_REQUEST} ^GET\ /?http [NC,OR]
# Forbid if UA is a single word - case-insensitive, A-Z only
RewriteCond %{HTTP_USER_AGENT} ^[a-z]+$ [NC]
RewriteCond %{REQUEST_URI} !/getout\.php$
RewriteRule .* /getout.php [L]
#
# Forbid if blank (or "-") Referer *and* UA, except for HEAD requests from caching proxies (such as AOL)
RewriteCond %{REQUEST_METHOD} !^HEAD$
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{REQUEST_URI} !/getout\.php$
RewriteRule .* /getout.php [L]

If you still need help debugging the cgi-bin problem, make a back-up copy of your file, then replace all of these rules with the simple rule I provided above, and try accessing your site. Then post the contents of your server error log so we can look at it.

Jim

Joe Belmaati

10:03 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



Thakn you very much for your help. All of my htaccess is copied and pasted from the long thread on this board, and most of it is from Balam's post. I have had several AOL users browsing with no problem.

Anywho, I tried to just do the simple re-write using your code, and the rewrite works in that I am given the right page.

But my cgi-bin still doesn't work. Here is the excerpt from my error log:

[Thu Oct 7 23:51:50 2004] [error] [client ##.###.##.##] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /var/www/www.mysite.com/www/cgi-bin/contact.pl

But FollowSymLinks must clearly be on and my host allows "Allow override all".

Another strange thing is that the re-write rules in my root-dir htaccess don't echo down to a subdirectory that holds a bulletin board with some rewrite rules that make static links out of dynamic links.

jdMorgan

10:26 pm on Oct 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> But FollowSymLinks must clearly be on and my host allows "Allow override all".

I don't see that in your code, though.

Try adding

 Options +FollowSymLinks 

before your RewriteEngine on directive.

You may have to add

 RewriteOptions inherit 

to .htaccess in your lower-level directories. Your server configuration may currently be set with RewriteOptions none. With this set to none, lower-level directories will not inherit the configuration of their parent directories.

> I have had several AOL users browsing with no problem.
Yes, but the AOL cache won't work for them. Therefore, they will think your site is slow. Unlike others which do CGET requests, AOL caching proxies (and a few others) use HEAD requests to check the Last-Modified time on your files. If the file is current, they will use their own cached copy, saving you bandwidth. If you return a 403 in response to their cache checking HEAD request, then they will unconditionally fetch your pages and images every time. I would advise that you take advantage of my past mistakes and use the code modification I posted.

Jim

Joe Belmaati

10:40 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



Thank you very much Jim!

Yes, I implemented your suggestions for the end bit as soon as I read it. That was very kind of you.

The Rewrite inherit command works perfectly, but I am still getting the same error with the FollowSymLinks command (which is I was also using when I tried your first simple code).

Could the problem be related to the SymLinksIfOwnerMatch command?

jdMorgan

11:01 pm on Oct 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You just need one or the other. Either of them serves as an 'enable' for mod_rewrite -- I don't know why, but you must have either one of those Options in order to use mod_rewrite.

Check for an .htaccess file inside your cgi-bin directory. If one is present, you may need to add the RewriteOptions inherit there as well.

If you have used Alias or ScriptAlias to protect your cgi-bin, then this could also be a source of trouble. It is also possible that the file permissions are set incorrectly on .htaccess or on the script.

It often seems that every Apache server is configured differently, and that most are wrong... :(

Anyway, think outside the box. Put an .html or .jpg file in cgi-bin and try to access it. Put the script in a different directory and try to rewrite to it there. The more information you can gather about the problem, the better.

Jim

Joe Belmaati

11:16 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



Thank you very much, Jim.

1. There is no htaccess file in the cgi-bin
2. Both jpg and html in cgi_bin = 403
3. I don't know how my host has ScriptAlias configured
4. Filepermission script = 755 Filepermission htaccess = 666 (Can't get it to work with 644).

Joe Belmaati

11:17 pm on Oct 7, 2004 (gmt 0)

10+ Year Member



..although setting it to 644 doesn't solve my cgi-bin problem - it just makes it impossible for the getout script to write to the htaccess file

jdMorgan

12:15 am on Oct 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> 1. There is no htaccess file in the cgi-bin

Try adding one with RewriteOptions inherit?

I don't know that it will help, but I sure can't give you a magic answer, and a little wild experimentation can't hurt too much. At worst, you'll have more information.

Failing any solution here, you should contact your host about any special restrictions or handling of cgi-bin directories. They may have configured some special treatment for cgi-bin directories in httpd.conf.

Jim