Forum Moderators: open

Message Too Old, No Replies

duckduckgo ips

Not just amazon

         

dstiles

9:17 am on Jun 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Duckduckgo is now using one of MS's IPs for its crawl - at least for an icon crawl (which is what I've seen over the past couple of days).

The IPs used are at [help.duckduckgo.com...]

At present they are limited to the following, all Amazon except as noted:

23.21.227.69
40.88.21.235 MSFT
50.16.241.113
50.16.241.114
50.16.241.117
50.16.247.234
52.204.97.54
52.5.190.19
54.197.234.188
54.208.100.253
54.208.102.37
107.21.1.8

not2easy

11:10 am on Jun 10, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Good to know. At least they have a stated list to use with "SetEnvIf Remote_Addr". :(

blend27

1:48 pm on Jun 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is funny, just yesterday I was going thru Denied Amazon IPs and headers supplied. DD never really visited this site. But that changed so I added those DD IPs to a whitelist(took it from their help page).

Now if they could only remove single qoutes from the UA they use.... would be totally awesome!
'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'
Just saying...

Also it seems that all IPs have the same PTR as in duckduckbot.duckduckgo.com

p.s. Headers:

ip: 40.88.21.235
remote host: duckduckbot.duckduckgo.com
time: {ts '2020-05-19 16:15:17'}
http_content:
method: GET
protocol: HTTP/1.1
Accept-Language: en-US,*
user-agent: 'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'
host: www.example.com
connection: Keep-Alive
accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
content-length: 0

wilderness

1:49 pm on Jun 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They must use another source for crawls.
I've had duckduck denied since 2008 when they first used a private IP, and yet still get refers.

lucy24

3:48 pm on Jun 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At least they have a stated list to use with "SetEnvIf Remote_Addr".
Heh. I just use BrowserMatch,* because every putative DuckDuckBot I’ve ever seen is a faker. (Last I heard, they got their crawl data from bing.)

The faviconbot is real, though, because they’re one of the search engines that displays a favicon next to the each item in the SERP (Yandex, I think, also does). This creates further confusion because the favicon request is always preceded by a front-page request--including an auto-referer, so you have to poke holes if this behavior would normally be blocked.

Now if they could only remove single quotes
If they were the only entity ever to use single quotes, it would be no problem. Personally I block any UA that begins with a nonword character:
BrowserMatch ^\W bad_agent=nonword
(The value of "bad_agent" is not used in access controls--in fact it can’t be, in mod_authzwhatsit--but it is included in logged headers to make it easier for me to see what they did to offend.)


* Closer look at my shared htacess reveals that I don’t, actually. Oops. Thanks to routine header deficits I don’t need to block them by name. I do un-set "bad_range" for the faviconbot.

Steven29

2:57 am on Jun 11, 2020 (gmt 0)



It would be nice if legitimate BOTS on the cloud could use a standard like a rDNS, website with a list of their ips and whatever else it takes to prove they are legitimate. The cloud isn't these bot's proxie. Who else receive Proximic requests daily that have no way to validate?

JorgeV

2:38 pm on Sep 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

From my observations, the DDG bot no longer comes from the IP listed on their own page!

I see DDG bot requests from :


20.193.45.xxx
20.193.47.xxx
20.40.8.xxx
23.98.121.xxx
40.114.177.xxx
40.127.242.xxx
40.64.105.xxx
40.64.106.xxx
40.89.250.xxx
51.137.10.xxx
52.141.215.xxx
52.149.234.xxx
191.234.192.xxx

All these IPs belongs to Microsoft AS Blocks. But no way to authenticate legitimate DDG bot, since none of these IPs are listed on DDG bot's own page. This is not professional!

lucy24

2:56 pm on Sep 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



none of these IPs are listed on DDG bot's own page
Do you have reason to believe that they are nevertheless legitimate? If they are, it would serve them right for crawling from IPs that they themselves won’t admit to.

Personally I continue to assume they’re all fakers, except the faviconbot.

JorgeV

3:04 pm on Sep 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



So far, I block all these requests, since they are not matching DDG Bot's own page.

However, DDG bot is no longer visiting my sites from "official" IPs addresses.

iamlost

4:41 pm on Sep 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For over a decade now I rDNS every request that gets past various prior defensive filters. Bots are increasingly sneaky.

With whitelisted bots such as ddg the rDNS is upfront rather than being a final check. Yes, some bots get VIP preferential service. Their behaviour is still tracked and still held to standard; temp banning of allowed bots for bad behaviour is not unknown.

notriddle

5:46 am on Sep 6, 2020 (gmt 0)

5+ Year Member



They must use another source for crawls.


Yeah. Microsoft Bing.

The DuckDuckBot is only used for instant answers and favicons.

JorgeV

11:08 am on Sep 7, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

As I mentioned above, I see no more hits at all from DDB from their listed IP.

All hits, claiming from being from DDB comes from MSN range, and with this user agent :

DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)

The URL listed in the UA doesn't exist.

WTF?

So, I am blocking all these requests, but now I have not a single legitimate hit from DDB...

brotherhood of LAN

12:05 pm on Sep 7, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>The URL listed in the UA doesn't exist.

It *should* be (or redirect to) [help.duckduckgo.com...] (as per OP)

Though that page lists Amazon IPs so maybe not that one either!

They shifted to Microsoft a good number of months back, some DNS history websites indicate when.

As @notriddle indicated, their bot isn't really for organic results anyways, Bing does that work, or Yandex if you're searching from certain countries.

lucy24

4:07 pm on Sep 7, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<tangent>
It *should* be (or redirect to)
I am often staggered by the number of crawlers--including some well-known, high-profile ones--whose UA includes an URL that redirects, and continues to do so for years. I mean, just how hard is it to edit your robot's UA string? Ninety seconds of work?
</tangent>