Forum Moderators: open
'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'Just saying...
ip: 40.88.21.235
remote host: duckduckbot.duckduckgo.com
time: {ts '2020-05-19 16:15:17'}
http_content:
method: GET
protocol: HTTP/1.1
Accept-Language: en-US,*
user-agent: 'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'
host: www.example.com
connection: Keep-Alive
accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
content-length: 0
At least they have a stated list to use with "SetEnvIf Remote_Addr".Heh. I just use BrowserMatch,* because every putative DuckDuckBot I’ve ever seen is a faker. (Last I heard, they got their crawl data from bing.)
Now if they could only remove single quotesIf they were the only entity ever to use single quotes, it would be no problem. Personally I block any UA that begins with a nonword character:
BrowserMatch ^\W bad_agent=nonword
(The value of "bad_agent" is not used in access controls--in fact it can’t be, in mod_authzwhatsit--but it is included in logged headers to make it easier for me to see what they did to offend.)
It *should* be (or redirect to)I am often staggered by the number of crawlers--including some well-known, high-profile ones--whose UA includes an URL that redirects, and continues to do so for years. I mean, just how hard is it to edit your robot's UA string? Ninety seconds of work?