Forum Moderators: bakedjake

Message Too Old, No Replies

DuckDuckBot first visits!

         

Dimitri

10:30 am on Mar 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Yesterday, for the first time in (my) history, I had visits from DuckDuckBot, I do not mean the favicon bot, but THE DuckDuckBot... 54 visits , in one hour! I was starting to wonder if their bot really existed ... or may be it was a glitch.

Now, I can't tell if this is a crawling / indexing, or just some link check, or other tests. Officially, DuckDuck is a mix of data from third part indexes, and supposedly their own index too.

ps: DuckDuckBot verified by ua, IP range and reverse.

lammert

2:27 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DuckDuckBot is a regular visitor on my sites. It could be that some webmasters will never see this bot because AFAIK, it uses exclusively Amazon EC2 servers which a number of webmasters block by default. The bot is sometimes a little bit hungry requesting the same page several times per minute.

Dimitri

3:03 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



it uses exclusively Amazon EC2 servers

Indeed: [help.duckduckgo.com...]

a number of webmasters block by default.

I am testing "known" acceptable requests, before denying unknown requests.

It's sure there are an important amount of requests from Amazon EC2/AWS ip ranges. There is a bit of everything, some are obviously scrapers, but for others, I am still puzzled. It might be some kind of apps, which might be doing something, but since I can't tell, the doors remain closed.

And there is/are guy(s), who keeps trying to download images using a Go lib : Go-http-client/1.1 , If I was letting it in, this would be hundreds of requests per minute.

lucy24

9:59 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It could be that some webmasters will never see this bot because AFAIK, it uses exclusively Amazon EC2 servers which a number of webmasters block by default.
Or could it be because their claims about robots.txt compliance are a barefaced lie?

:: detour to raw logs, cross-checked against listed IPs ::

They have a strikingly bizarre behavior which I'd forgotten about until I re-checked logs: visits start with a request for robots.txt with a referer-spam-type referer--generally some utterly random site, though once I found a Yandex search (not one that would lead to anything on my site, let alone to robots.txt) in the referer slot. This gets them the minimalist Disallow-everyone-everywhere
User-Agent: *
Disallow: /
... which they proceed to ignore. Further quirk is that they then, just like a human, get all the supporting files associated with the 403 page.

Frankly I'd always assumed they were all fakers, since they sure don’t act like a legitimate search-engine spider.

notriddle

11:41 pm on Mar 2, 2020 (gmt 0)

5+ Year Member



That's the fundamental problem with running your bot out of EC2. There's basically no way for anyone to tell if it's really your bot or if its a faker.

lucy24

11:45 pm on Mar 2, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's basically no way for anyone to tell if it's really your bot or if its a faker.
When they're coming from the down-to-the-last-digit IPs listed on their own page, you kinda have to assume it's the real thing. Unless they're got offspring sneaking in after hours to play with the robot when nobody else is using it?

tangor

1:43 am on Mar 3, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DDG has been coming around for quite some time for me ... and because it does respect robots.txt I just keep an eye on it.