Welcome to WebmasterWorld Guest from 3.227.2.109

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Can I have two RewriteRules in one RewriteCond group?

I want to redirect G-Bot spoofers to both a notifier script and a 403 page

     
8:37 am on Feb 13, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


I am trying to write a complicated .htaccess rule-set to detect IP addresses not within CIDRs known to me, where the user agent claims to be GoogleBot. I want to have a two pronged RewriteRule that first sends the "visitor" to a Perl script that sends mail to me with the details (I have the script already), then proceeds to serve a 403 page and stops execution.

Here is what I have tried, which gives a 500 error:

RewriteCond %{REMOTE_ADDR} !(known IP ranges for GoogleBot go here)
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1
RewriteRule (.*) cgi-bin/script-name.pl
RewriteRule !^robots\.txt$ - [F]

This fails with a 500 error, so I tried this:

RewriteRule!^robots\.txt$ cgi-bin/script-name.pl [F]

This does not cause a 500 error, and does display the 403 page, but does not also rewrite to the Perl script to send me a report. I tried changing the flags in the Rule to [R=403,L] but that caused a 500 error. How can I have both the redirect to the Perl script, to send me a report, and the display of an http 403 response (I use a custom 403 page) to the "visitor?"

3:23 pm on Feb 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> two-pronged...

No, there's no way to 'spawn' additional threads from the application layer.

There's also no way to make a set of RewriteConds apply to more than one rule; As designed, RewriteConds apply only to the first RewriteRule that follows them. (You can work around this problem by negating the rule's logic and using mod_rewrite's [S] flag to skip over a second rule that invokes the 403 handler or by stopping rule processing using [L] if the first rule's conditions are met.)

For the case at hand, I'd suggest:

Change your custom 403 page to do a virtual include of the PERL script using SSI, or the equivalent include in PHP.

Then, make sure you're not creating a loop -- the likely cause of the initial 500-Server Error.


RewriteCond %{HTTP_USER_AGENT} Googlebot/
RewriteCond %{HTTP_USER_AGENT} !^66\.249\.
RewriteRule ^(robots\.txtpath_to_custom_403_page\.shtml)$ - [F]

For the generic case described above, here's an example:

This won't work, because the RewriteConds are only applied to the first rule:


# Apply two rules if both RewriteConds are true
# (This won't work, because only the first rule is subject to the RewriteConds)
RewriteCond A
RewriteCond B
RewriteRule 1
RewriteRule 2

Solution: Invert the logic and use [S]:

# Skip two rules if either RewriteCond is NOT true
RewriteCond !A [OR]
RewriteCond !B
RewriteRule .* - [S=2]
RewriteRule a b
RewriteRule x y
# If above RewriteConds are not true, execution resumes following this line

Note that in addition to negating the patterns using "!", the default AND must be changed to [OR] and vice-versa. This is because (A+B)=!(!A*!B) and (A*B)=!(!A+!B) where "*" is AND, "+" is OR, and "!" is the NOT operator. For more information on this subject (Boolean logic), search for "DeMorgan's Theorem". Note the "english" expression of this concept by the use of the words "both" (AND) and "either" (OR) in the code comments above.

Jim

8:36 pm on Feb 13, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts: 319
votes: 0


Jim;
Thanks for that explanation. I will take my time digesting this new information, and will try to include the script in my custom 403. Fascinating! I'll let you know how it works out since a lot of people are bothered by GoogleBot spoofers.

Wiz

8:59 pm on Feb 13, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


Jim, may I ask one more follow up question about the conditions?

I have three distinct CIDRs for GoogleBot in my ReWriteCond and I'm not sure if I have listed them correctly. The Boolean logic is confusing me.

Here is what I have now, which captured a real GoogleBot visit today:


RewriteCond %{REMOTE_ADDR} !(64\.68\.8[0-7]\.)(64\.233\.(16[0-9]17[0-9]18[0-9]19[0-1])\.)(66\.249\.(6[4-9]7[0-9]8[0-9]9[0-5])\.)

Should I break that up into three lines of conditions, and if so, do I append [OR] to the first two lines, or leave them as AND?
EG:

RewriteCond %{REMOTE_ADDR} !^64\.68\.8[0-7]\. [OR]
RewriteCond %{REMOTE_ADDR} !^64\.233\.(16[0-9]17[0-9]18[0-9]19[0-1])\. [OR]
RewriteCond %{REMOTE_ADDR} !^66\.249\.(6[4-9]7[0-9]8[0-9]9[0-5])\.
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1
RewriteRule !^(robots\.txtpath-to-403.shtml) - [F]

Wiz

[edited by: Wizcrafts at 9:01 pm (utc) on Feb. 13, 2007]

10:10 pm on Feb 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Looks to me like the parentheses are messed up, and some non-optimal regex.

How about:


RewriteCond %{REMOTE_ADDR} !^(64\.68\.8[0-7]\.64\.233\.(1[678][0-9]19[01])\.66\.249\.(6[4-9][78][0-9]9[0-5])\.)

If you broke them up, you'd want ANDed RewriteConds.

Jim

10:22 pm on Feb 13, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 5, 2003
posts:319
votes: 0


Thanks again Jim. I suspected that I had messed up the RegExp and parentheses for the IP addresses. And thanks for confirming that negative conditions should be ANDed, not ORed.

Wiz

10:46 pm on Feb 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> And thanks for confirming that negative conditions should be ANDed, not ORed.

Those negative conditions should be ANDed, but of course, it depends on what you're trying to accomplish.

Part of the problem is the disconnect between natural language and logic. If asked the question, "Are you going to the movie tonight or tomorrow afternoon?" a human would likely respond with one of the two alternatives. A robot, seeing the OR in the question, would simply respond "Yes" or "No." :)

Jim