Forum Moderators: phranque
RewriteCond %{HTTP_REFERER} !.
RewriteCond %{HTTP_USER_AGENT} Chrome/108\.0\.0\.0
RewriteRule ^ebooks/(\w+/(\w+\.html)?)$ https://example.com/boilerplate/redirect.php?newpage=/ebooks/$1 [R=302,L]
The page “redirect.php” says, in effect, “I'm awfully sorry, but you have inadvertently replicated the behavior of an unwelcome robot” (because it is theoretically possible for a human to meet these conditions) ... and then it's got a link to the originally requested page. In this particular case--I've used it for a few others, including one category of actual humans--it's a robot that always uses the same user-agent, always coming in with a null* referer. RewriteCond %{REQUEST_URI} ^/foo/bar/
RewriteCond %{HTTP_REFERER} !.
RewriteRule ^ - [F] 3.87.229.113 - - [14/Nov/2023:01:13:48 -0500] "GET /foo/bar/ HTTP/1.1" 403 20 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html) X-Middleton/1" RewriteCond %{HTTP_REFERER} !.
RewriteRule ^foo/bar/ - [F]This is the syntax for htaccess or a <Directory> section; otherwise replace the ^ with whatever is appropriate for this specific config file.
I don't suppose there's a better option than [F], is there?You could return a different error, such as 503 or 418 (“teapot error”, used by my host for mod_security) or, heck, 429 (“too many requests”). If they're really invasive, they may already be getting some of those naturally. In mod_rewrite the syntax is, counterintuitively, [R=418] or [R=503] or numerical code of your choice; no [L] is needed.
If not, is there a way to forbid it AND prevent it from being written to the log?You did say you have access to the config file, right? If so, you could look into custom log settings (directive CustomLog under mod_log_config [httpd.apache.org] in the Apache docs). But really, the act of logging a request is probably the least of the server's problems, and it can be useful to have some kind of a record.
SecRuleEngine On
<LocationMatch "/foo/bar/">
# I'm using X_FORWARDED_FOR because of Cloudflare, and
# added the unique id: to each Sec line for mod_security2
SecAction initcol:ip=%{HTTP_X_FORWARDED_FOR},pass,nolog,id:11
SecAction "phase:5,deprecatevar:ip.somepathcounter=1/1,pass,nolog,id:12"
SecRule IP:SOMEPATHCOUNTER "@gt 60" "phase:2,pause:300,deny,status:429,setenv:RATELIMITED,skip:1,nolog,id:13"
SecAction "phase:2,pass,setvar:ip.somepathcounter=+1,nolog,id:14"
Header always set Retry-After "10" env=RATELIMITED
</LocationMatch>
ErrorDocument 429 "Rate Limit Exceeded" but when I restarted Apache it gave me an error that it was an invalid response valueHm, that's interesting, since mine is also Apache. (But I suspect they haven't updated the mod_security rules in a while, since 418s are a minute number compared to 403.) Maybe there's something you have to change elsewhere in the config file to add 418 to the list of possible responses? In any case I doubt the robot really cares which 400-class response it receives; it just makes it easier to eyeball them in logs.
but I didn't see any change in requestsAnd the bad news is ... There isn't always any relationship between the response a robot receives today, and the request it sends tomorrow. With the obvious exception of things like legitimate search engines, which will quickly learn which URLs get 301 and 404 responses, and stop requesting them unless the old URLs are continuously reinforced by outdated links. I mean, you see requests for php-admin year after year after year, don't you.
If incoming requests match...
[URI Path] [starts with] [/foo/bar]
With the same characteristics...
[IP]
When rate exceeds...
Requests [2]
Period [10 seconds]
Then take action...
[Block] with response code [429]
For duration...
[10 seconds]
I don't have a way of blocking them at the firewall, because I'm using Cloudflare and now every connection comes from one of about 12 Amazon IPs! So while CSF (the firewall) does have a connection limit option, it would end up blocking those Amazon IPs instead of the bad bot.
[edited by: thecoalman at 7:21 pm (utc) on Dec 31, 2023]
Doesn't CloudFlare offer a captcha?