Forum Moderators: phranque
# BLOCK BAD USER AGENTS
SetEnvIfNoCase User-Agent (archive.org|ahrefsbot|baiduspider|binlar|casper|checkpriv|choppy|clshttp|cmsworld|cukbot||diavol|domainappender|dotbot|extract|feedfinder|flicky|getintentcrawler|g00g1e|grapeshotcrawler|harvest|heritrix|httrack|kmccrew|loader|maxpoint|maxpointcrawler|miner|mj12bot|naver|netseer|nikto|nutch|paperlibot|planetwork|plukkie|postrank|proximic|purebot|pycurl|python|qwantify|seekerspider|semrushbot|seznambot|siclab|skygrid|sogou|sqlmap|sucker|turnit|vikspider|w3c-checklink|winhttp|wotbox|xxxyy|yandexbot|youda|zmeu|zune) bad_bot
# BAD USER AGENTS
# JAL Sets Files for Mod Deflate January 29 2016
AddOutPutFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript application/rss+xml application/xml application/json image/x-icon
# JAL Mod Deflate
# JAL Sets Header Caching January 29 2016
<FilesMatch "\.(gif|jpg|jpeg|png|ico|html|css|txt|xml|javascript|js)$">
Header set Cache-Control "max-age=2592000, public"
</FilesMatch>
# JAL Sets Header Caching
# JAL Sets Error Documents January 29 2016
ErrorDocument 500 /errors/errmaintenance.html
ErrorDocument 404 /errors/errnotfound.html
ErrorDocument 403 /errors/errforbidden.html
ErrorDocument 401 /errors/errunauthorized.html
ErrorDocument 400 /errors/errbadrequest.html
# JAL Error Documents
# JAL Sets Expires Defaults January 29 2016
ExpiresDefault A172800
ExpiresByType text/css A31536000
ExpiresByType application/x-javascript A31536000
ExpiresByType text/x-component A31536000
ExpiresByType text/html A31536000
ExpiresByType text/plain A31536000
ExpiresByType text/xml A31536000
ExpiresByType image/bmp A31536000
ExpiresByType image/gif A31536000
ExpiresByType image/x-icon A31536000
ExpiresByType image/jpeg A31536000
ExpiresByType application/pdf A31536000
ExpiresByType image/png A31536000
# JAL Sets Expires Defaults
# BAD USER AGENTS
<limit GET POST>
Order Allow,Deny
Allow from All
Deny from env=bad_bot
</limit>
# BAD USER AGENTS SetEnvIfNoCase User-Agent
I added the following to the htaccess file for one site
<RequireAll>
SetEnvIfNoCase User-Agent "^(archive.org_bot|ia_archiver|ahrefsbot|baiduspiker|cukbot|dotbot|domainappender|feedfinder|extract|getintentcrawler|getintent|g00gle|grapeshotcrawler|harvest|meritrix|maxpoint|maxpointcrawler|miner|mj12bot|naver|netseer|nikto|nutch|oBot|paperlibot|planetwork|plukkie|postrank|proxic|purebot|pycurl|python|qwantify|seekspider|semrushbot|seznambot|siclab|skygrid|sogou|sqlmap|sucker|turnit|w3c-checklink|winhttp|wotbox|xxxyy|yandexbot|youda|zmeu|zune)" bad_bot
<If "%{HTTP_USER_AGENT} =='bad_bot'">
Require all denied
</If>
<Else>
Require all granted
</Else>
</RequireAll>
SetEnvIfNoCase User-Agent "^(archive.org_bot|ia_archiver|ahrefsbot|baiduspiker|cukbot|dotbot|domainappender|feedfinder|extract|getintentcrawler|getintent|g00gle|grapeshotcrawler|harvest|meritrix|maxpoint|maxpointcrawler|miner|mj12bot|naver|netseer|nikto|nutch|oBot|paperlibot|planetwork|plukkie|postrank|proxic|purebot|pycurl|python|qwantify|seekspider|semrushbot|seznambot|siclab|skygrid|sogou|sqlmap|sucker|turnit|w3c-checklink|winhttp|wotbox|xxxyy|yandexbot|youda|zmeu|zune)" bad_bot
<If "%{HTTP_USER_AGENT} =='bad_bot'">
Require all denied
</If> I'm sure it's wrong because I couldn't find a clear explanation of "how to wrap the logic"
SetEnvIfNoCase User-Agent "^(archive.org_bot|ia_archiver|ahrefsbot|baiduspiker|cukbot|dotbot|domainappender|feedfinder|extract|getintentcrawler|getintent|g00gle|grapeshotcrawler|harvest|meritrix|maxpoint|maxpointcrawler|miner|mj12bot|naver|netseer|nikto|nutch|oBot|paperlibot|planetwork|plukkie|postrank|proxic|purebot|pycurl|python|qwantify|seekspider|semrushbot|seznambot|siclab|skygrid|sogou|sqlmap|sucker|turnit|w3c-checklink|winhttp|wotbox|xxxyy|yandexbot|youda|zmeu|zune)" bad_bot
<If "%{HTTP_USER_AGENT} =='bad_bot'">
<If "-T reqenv('bad_bot')">
<If "-T reqenv('bad_bot')">
<If "-T reqenv('bad_bot')">
<If "-T env('bad_bot')">
<If "-T %{ENV:bad_bot}">
<If "-T %{ENV:bad_bot} == '1'">
SetEnvIf User-Agent . bad_bot
<If "-T %{ENV:bad_bot}">
Header set X-Blocked Yes
</If>
<If "%{HTTP_USER_AGENT} =~ /(archive.org_bot|ia_archiver|ahrefsbot|baiduspiker|cukbot|dotbot|domainappender|feedfinder|extract|getintentcrawler|getintent|g00gle|grapeshotcrawler|harvest|meritrix|maxpoint|maxpointcrawler|miner|mj12bot|naver|netseer|nikto|nutch|oBot|paperlibot|planetwork|plukkie|postrank|proxic|purebot|pycurl|python|qwantify|seekspider|semrushbot|seznambot|siclab|skygrid|sogou|sqlmap|sucker|turnit|w3c-checklink|winhttp|wotbox|xxxyy|yandexbot|youda|zmeu|zune)/i">
Header set X-Blocked Yes
SetEnv bad_bot 1
</If>
Order Allow,Deny
Allow from all
Deny from env=blahblahblah
Deny from env=blahblahblah (long list of environmental variables)
and then 2.4 would be Require all granted
<RequireNone>
Require env blahblah
Require env blahblah (same list again, globally converted)
</RequireNone>
:: wandering off to pore over docs again :: The Allow, Deny, and Order directives, provided by mod_access_compat, are deprecated and will go away in a future version. You should avoid using them, and avoid outdated tutorials recommending their use.
"Rather than using mod_rewrite for this, you can accomplish the same end using alternate means, as illustrated here:"
SetEnvIfNoCase User-Agent "^NameOfBadRobot" goaway
<Location "/secret/files">
<RequireAll>
Require all granted
Require not env goaway
</RequireAll>
</Location> I'm not at all familiar with <Location>.