Forum Moderators: phranque
<Location "/">
SetEnvIfNoCase User-Agent "lwp-trivial" bad_bot
SetEnvIfNoCase User-Agent "libwww" bad_bot
SetEnvIfNoCase User-Agent "Wget" bad_bot
Deny from env=bad_bot
</Location> <IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AlphaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteRule ^(.*)$ - [L,R=403]
</IfModule> <Location "/var/www/sites/">
SetEnvIf User-Agent BLEXBot GoAway
Order allow,deny
Allow from all
Deny from env=GoAway
</Location> RewriteCond %{HTTP_USER_AGENT} "blexbot" [nocase]
RewriteRule ^.*$ – [forbidden,last] SecRule REQUEST_HEADERS:User-Agent "BLEXBot" "deny,status:403,id:5000218,msg:'Badbot test for Blexbot'"
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile badbots.txt" "id:350001,rev:1,severity:2,log,msg:'BAD BOT - Detected and Blocked. '" <Location "/var/www/sites/">That's not a location. It's a directory.
<Directory />
Order Allow,Deny
Deny from all
AllowOverride none
</Directory>
--it should already be there, unless you've accidentally trashed it--and then put in exceptions for specific directories. <FilesMatch "^\.ht">
Order allow,deny
Deny from all
Satisfy All
</FilesMatch>
Do not change this. It is what prevents all visitors, everywhere, all the time, from seeing your .htaccess or .htpasswd files. Similarly, never ever ever say <Files *> or similar in htaccess, because it will cancel this barrier. SetEnvIf User-Agent ZoominfoBot keep_out
order allow,deny
allow from all
deny from env=keep_out [edited by: phranque at 12:45 am (utc) on Nov 26, 2017]
[edit reason] assuming "SetEndif" was a typo [/edit]
SetEnvIfNoCase User-Agent "blexbot" badbot
<Directory />
Order Allow,Deny
Deny from env=badbot
AllowOverride none
</Directory> They attack, you parry, they change namesI read this too fast and thought you were saying that robots get married and change their names, which creates an interesting mental picture.
Each of my sites is in a separate directory under /var/www/sites/.In that case, all your access-control rules should go in a <Directory> section for that overall directory. And as TorontoBoy said, don't use RewriteRules. Because of mod_rewrite's wonky inheritance, it should be reserved for individual sites--whether those end up being .htaccess, or site-specific <Directory> sections. A lot can be done with mod_setenvif, since it works nicely in combination with mod_authzthingwhatsit (exact name changes from one Apache version to the next), and inherits consistently.
What to use When
Choosing between filesystem containers and webspace containers is actually quite easy. When applying directives to objects that reside in the filesystem always use <Directory> or <Files>. When applying directives to objects that do not reside in the filesystem (such as a webpage generated from a database), use <Location>.
It is important to never use <Location> when trying to restrict access to objects in the filesystem. This is because many different webspace locations (URLs) could map to the same filesystem location, allowing your restrictions to be circumvented.
Sections inside <VirtualHost> sections are applied after the corresponding sections outside the virtual host definition. This allows virtual hosts to override the main server configuration.
Sections inside <VirtualHost> sections are applied after the corresponding sections outside the virtual host definition.Ah thanks, phranque, it was the before-or-after that I couldn't find. I did find the bit about not using <Location> for access control. (It now occurs to me that the reason you almost never see <Location> in the present subforum is that it can't be used in htaccess, which is what most CMS users are limited to, and it's rarely meaningful outside of database-driven sites.)
<Directory /var/www/sites>with leading, without trailing slash.
<Directory "/var/www/sites/mysite1">
Deny from env=badbot
</Directory> <Directory "/var/www/sites/mysite1">
Allow from all
AllowOverride AuthConfig Indexes Limit
Options +FollowSymLinks
Deny from env=badbot
</Directory> <Directory "/var/www/sites/mysite1">
Allow from all
AllowOverride AuthConfig Indexes Limit
Options +FollowSymLinks
</Directory>
<Directory "/var/www/sites/mysite1">
Deny from env=badbot
</Directory> Include /etc/httpd/conf/badbots.confto httpd.conf (above the virtual hosts section), and then having a big list of SetEnvIfNoCase directives in the badbots.conf file, in the location noted.
This sort of makes sense in a way, as I've already told the server not to serve anything from that virtual host to the bad bot requesting it.
ErrorDocument 403 http://www.baidu.comwhich conveniently redirects 403 recipients to baidu.com. (or wherever). The only thing with doing the external redirect - and I suppose the same would be true if was an internal redirect - is that the log files will log it as a 302, rather than a 403.
I thought BrowserMatch pretty much behaved identically to SetEnvIfNoCase UserAgent
ErrorDocument 404 "404"
Not quite sure what your reference to "ErrorDocument 404" is about.Other post, between yours and mine I think.
I imagine their method of checking strings will likely be to simply convert both strings to the same case and compareCome to think of it, I imagine you are right--but that's still a conversion that has to take place. To our human brains, a > A and b > B is obvious, but to a computer, it's a matter of adding some number* to selected codepoints in the original string ... and if you're outside of plain ASCII, it won't always be the same number.