Forum Moderators: open

Message Too Old, No Replies

DuckDuckBot-Https

new UA, new range

         

keyplyr

7:47 pm on Aug 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)
Protocol: HTTP/1.1
Robots.txt: Yes
Host: AWS
107.20.0.0 - 107.23.255.255
107.20.0.0/14

This UA nor this IP range is listed on their bot info page, however they both are in the acceptible variation margin.

Related:
[webmasterworld.com...]

TorontoBoy

9:44 pm on Sep 1, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)
23.21.226.***
23.20.0.0 - 23.23.255.255
23.20.0.0/14
Host: AMAZON

Request Header:
2018-08-22:13:36:43
URL: /
IP: 23.21.226.***
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-US,*
Connection: Keep-Alive
Host: example.com
User-Agent: Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)


Delinked URL, obscured IP address per Forum Charter [webmasterworld.com]

[edited by: keyplyr at 11:00 pm (utc) on Sep 1, 2018]

keyplyr

10:46 pm on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, if using Amazon's cloud the agent will often be seen from various AWS nodes.

lucy24

10:23 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the last couple of days I have seen them from:
34.228.147.abc
50.16.241.abc
50.16.242.abc
Behavior and headers identical for all, leaving no reasonable doubt that all are legitimate DDG bots.

For me the absolute, beyond-the-shadow-of-a-doubt clincher was:
all robots.txt requests give
http://2.brf.be
(wtf?) as referer, and all front-page requests give
http://example.com/robots.txt
as referer. On this first batch of visits they were blocked on header grounds, so I have not yet had the opportunity to see what creative referers they claim for interior pages.

D’you suppose it’s any use asking them to stop sending a ### referer whose only possible effect is to get them blocked?

Hmph.

keyplyr

11:31 pm on Sep 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the last couple of days I have seen them from:
Yes, scalable cloud hosting will use available distributed resources.

D’you suppose it’s any use asking them to stop sending a ### referer whose only possible effect is to get them blocked?
It's the way you have your end set up*. If you would allow straight access to your robots.txt you wouldn't see that. And no, I don't think they care. If you want them to index your web properties correctly, give them access without issue.

I see no referrers from DDG when they request robots.txt across my 3 sites (one at DH, one at Polytechnic Univ and one at rackmount)

*This comment is based on my understanding of an earlier statement of yours regarding robots.txt access at your site, albeit possibly misunderstood.

lucy24

2:15 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



albeit possibly misunderstood
Everybody gets robots.txt, even when I know perfectly well they just want to see if I've listed the names of specialized CMS directories that they otherwise wouldn't know about. It's in a <Files> envelope with Allow from all. (RewriteRules are constrained to filetypes, so no further hole-poking is needed.) That's not the issue. I'm thinking of the inexplicable referers the DDG faviconbot always sends--and who knows what kind of referer will come in with page requests. It had better not be a generic front-page referer, because I block those for all pages that are not, in fact, linked from the front page. The minor irony here is that everyone is allowed to get the favicon--but when the faviconbot was blocked in a page request, it wouldn't ask for the favicon even though that was its only reason for making the request in the first place.

You may be remembering a comment about domain-name-canonicalization redirects, where robots.txt is exempt on sites that are https, but where it's purely a with/without www issue they get redirected right along with everyone else.

:: irritably wondering where the day has gone, because all I did was a couple of days' logs and some spit-and-polish involving one directory's stylesheets, and now it's getting dark already and I haven't done a lick of proofreading ::

keyplyr

6:57 am on Oct 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm thinking of the inexplicable referers the DDG faviconbot always sends
Well that's a different UA than we're discussing. Yeah, the DDG favicon bot includes a referrer.

all robots.txt requests give http://2.brf.be (wtf?) as referer
Well normally I'd say that the UA was spoofed and the referrer is log-spam. Easy to use the same AWS ranges & fake the headers to spoof DDG.

keyplyr

8:52 pm on Oct 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just noticed this bot with singles quote before and after the UA string:

'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'

I may have missed this in earlier report, or this may be new.

Also, this bot leaves out the URL killer +