Forum Moderators: open
does anyone have the actual addresses that DuckDuckGo crawls from?There's a very recent thread--whch I now can't find, but someone will know--that includes a link to their About Our Robot page, giving their exact aa.bb.cc.dd addresses.
There's not really a way to use Mod_Authz_Host, is thereWhen I talk about using mod_setenvif for access control, that's always in conjunction with mod_auth-whatever. The access-control part will then be
# Cliqzbot - amazon
BrowserMatch Cliqzbot cliqz bot=cliqz
# duckduckgo - amazon
BrowserMatch DuckDuckBot|DuckDuckGo-Favicons-Bot duck bot=duck
# amazon bots for cliqz, duck - add more as needed (54.128.0.0/9 includes merck and short CN block)
<if "-R '3.0.0.0/8' || -R '18.128.0.0/9' || -R '13.32.0.0/12' || -R '13.48.0.0/13' || -R '13.56.0.0/14' || -R '13.112.0.0/14' || -R '13.124.0.0/14' || -R '13.208.0.0/14' || -R '13.228.0.0/14' || -R '13.232.0.0/13' || -R '34.192.0.0/10' || -R '35.152.0.0/13' || -R '35.160.0.0/12' || -R '35.176.0.0/13' || -R '50.16.0.0/14' || -R '23.20.0.0/14' || -R '52.0.0.0/10' || -R '52.64.0.0/12' || -R '52.84.0.0/14' || -R '52.88.0.0/13' || -R '52.192.0.0/11' || -R '54.64.0.0/13' || -R '54.72.0.0/13' || -R '54.80.0.0/12' || -R '54.128.0.0/9' || -R '54.192.0.0/12' || -R '54.208.0.0/13' || -R '54.216.0.0/14' || -R '54.220.0.0/15' || -R '107.20.0.0/14' ">
SetEnvIf Remote_Addr .* amazon ips=amazon:$0
</if> Require expr %{REQUEST_URI} =~ m#/robots\.txt#
Require expr %{REQUEST_URI} =~ m#favicon\.ico|apple-touch-icon\.png|apple-touch-icon-precomposed\.png#i
<RequireAll>
Require env amazon
<RequireAny>
Require env cliqz
Require env duck
</RequireAny>
</RequireAll>