Forum Moderators: phranque
Here is what I have tried, which gives a 500 error:
RewriteCond %{REMOTE_ADDR} !(known IP ranges for GoogleBot go here)
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1
RewriteRule (.*) cgi-bin/script-name.pl
RewriteRule !^robots\.txt$ - [F]
This fails with a 500 error, so I tried this:
RewriteRule!^robots\.txt$ cgi-bin/script-name.pl [F]
This does not cause a 500 error, and does display the 403 page, but does not also rewrite to the Perl script to send me a report. I tried changing the flags in the Rule to [R=403,L] but that caused a 500 error. How can I have both the redirect to the Perl script, to send me a report, and the display of an http 403 response (I use a custom 403 page) to the "visitor?"
No, there's no way to 'spawn' additional threads from the application layer.
There's also no way to make a set of RewriteConds apply to more than one rule; As designed, RewriteConds apply only to the first RewriteRule that follows them. (You can work around this problem by negating the rule's logic and using mod_rewrite's [S] flag to skip over a second rule that invokes the 403 handler or by stopping rule processing using [L] if the first rule's conditions are met.)
For the case at hand, I'd suggest:
Change your custom 403 page to do a virtual include of the PERL script using SSI, or the equivalent include in PHP.
Then, make sure you're not creating a loop -- the likely cause of the initial 500-Server Error.
RewriteCond %{HTTP_USER_AGENT} Googlebot/
RewriteCond %{HTTP_USER_AGENT} !^66\.249\.
RewriteRule ^(robots\.txt¦path_to_custom_403_page\.shtml)$ - [F]
For the generic case described above, here's an example:
This won't work, because the RewriteConds are only applied to the first rule:
# Apply two rules if both RewriteConds are true
# (This won't work, because only the first rule is subject to the RewriteConds)
RewriteCond A
RewriteCond B
RewriteRule 1
RewriteRule 2
# Skip two rules if either RewriteCond is NOT true
RewriteCond !A [OR]
RewriteCond !B
RewriteRule .* - [S=2]
RewriteRule a b
RewriteRule x y
# If above RewriteConds are not true, execution resumes following this line
Jim
I have three distinct CIDRs for GoogleBot in my ReWriteCond and I'm not sure if I have listed them correctly. The Boolean logic is confusing me.
Here is what I have now, which captured a real GoogleBot visit today:
RewriteCond %{REMOTE_ADDR} !(64\.68\.8[0-7]\.)¦(64\.233\.(16[0-9]¦17[0-9]¦18[0-9]¦19[0-1])\.)¦(66\.249\.(6[4-9]¦7[0-9]¦8[0-9]¦9[0-5])\.)
RewriteCond %{REMOTE_ADDR} !^64\.68\.8[0-7]\. [OR]
RewriteCond %{REMOTE_ADDR} !^64\.233\.(16[0-9]¦17[0-9]¦18[0-9]¦19[0-1])\. [OR]
RewriteCond %{REMOTE_ADDR} !^66\.249\.(6[4-9]¦7[0-9]¦8[0-9]¦9[0-5])\.
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1
RewriteRule !^(robots\.txt¦path-to-403.shtml) - [F]
[edited by: Wizcrafts at 9:01 pm (utc) on Feb. 13, 2007]
Those negative conditions should be ANDed, but of course, it depends on what you're trying to accomplish.
Part of the problem is the disconnect between natural language and logic. If asked the question, "Are you going to the movie tonight or tomorrow afternoon?" a human would likely respond with one of the two alternatives. A robot, seeing the OR in the question, would simply respond "Yes" or "No." :)
Jim