Forum Moderators: phranque

Message Too Old, No Replies

htaccess error 500 for Rewrites Order deny allow

         

essen

1:37 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



I have this in htaccess and gets err 500:

RewriteCond %{HTTP_USER_AGENT} ^.*(Java|URLspion|Baiduspider|Yandexbot).*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*(sogou\sspider|NaverBot|ichiro).*$ [NC]
RewriteRule ^.*$ ^robots.txt$ [E=bad_bot:1,L]

Though many more in above
After this comes an Order deny, allow
like this:

Order Deny, Allow
Deny from 46.8.34.0/24
Deny from 46.8.80.0/20
Deny from 46.17.240.0/21
Deny from 46.18.0.0/21
Deny from 46.28.64.0/21
Deny from 46.28.192.0/21
Deny from 46.29.128.0/21
Allow from env=bad_bot

I use env because I have some id's of the blocked bots in both RewriteCond and Deny from.
What is wrong with this code. Syntax-err, or ?. Please help.

wilderness

1:53 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Try removing the following from each line

^.*
.*$

in addition do you have a line preceding the mod-rewrite section that reads:

RewriteEngine on

essen

2:48 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



Yes I have. htaccess function fine without it. I forgot to remark;
err 500 only shows up in access-log when one of the blocked bots
accesses. Aparently they get a 403. The website function fine otherwise.
I'll try removed what you suggest, thanks.
/essen

wilderness

3:02 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



err 500 only shows up in access-log when one of the blocked bots accesses


Have you 'defined' error docs?

ErrorDocument 403 /path-to-custom-file.html
ErrorDocument 404 /path-to-custom-file.html
ErrorDocument 410 /path-to-custom-file.html

essen

3:13 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



Yes, all 3 of these

not2easy

3:34 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



And have you made sure they are not blocked from your 403 page?

essen

4:14 pm on Mar 6, 2015 (gmt 0)

10+ Year Member




That's exactly what I try to avoid - getting a 403, by using env allow in Order deny, allow. Being afraid this otherwise would stop them from going to robot.txt in the Rewriterule. you se it. I don't know anough to know the internal operation the apache do. Meybe apache woun't accept such a construction. That's what I hope someone here can tell.
/essen

essen

6:38 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



Removing .* as suggested just changes 403 to 404. It's not 403 and 404 that is wanted. It's a - 200 - a redirection to robots.txt that is wanted. I only have
Allow from env= bad_bot, not to have RewriteRule blocking access to robots.txt
I realize, my code construction must be very unusual. Otherwise someone here had
a solution to it.

/essen

essen

7:22 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



lucy24 wrote a few lines down in another thread:
#2 A response header isn't sent out until all modules have had their crack at a given request. If any module generates a 403 (F, lockout) this will overrule any other response generated by any other module, whether earlier or later.

This is probably related to my problem. -Apache an strange cousin

lucy24

8:33 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's exactly what I try to avoid - getting a 403, by using env allow in Order deny, allow.

Each module that can issue a 403 needs its own exemption. So an "Allow from all" directive only works for those requests that were locked out via a "Deny from..." line. If mod_rewrite is issuing lockouts of its own, you need a line saying

RewriteRule ^403\.html - [L]


replacing "403.html" with the exact path-and-name of your custom error page.

But here you've got a different problem:

RewriteRule ^.*$ ^robots.txt$ [E=bad_bot:1,L]


This rule does not have a RewriteCond exempting requests for robots.txt. So you get an infinite loop, which will show up in error logs as a 500, independent of the response sent out to the user.

I hope the ^ in the target was a typo; you meant / for the root. A ^ caret in this location will be interpreted as a literal ^ character. (I tested.)

I use env because I have some id's of the blocked bots in both RewriteCond and Deny from.

I don't understand what you're trying to do here. And you can set environmental variables in mod_setenvif; you don't need to bring out the mod_rewrite heavy artillery.

One thing you need-- and I think you haven't got-- is an envelope that looks something like this:
<Files "robots.txt">
Order Deny,Allow
Allow from all
</Files>

A "Files" envelope overrides anything that came earlier.

<tangent>
afaik Yandex obeys robots.txt and has done so for many years. So if you don't want it around, just ban it in robots.txt
</tangent>

essen

8:56 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



Thanks lucy24
I'm convinced that will save my *ss