Forum Moderators: open

Message Too Old, No Replies

Web hits from duckduckgo favicons-bot

Includes strange cookie string

         

SumGuy

11:03 pm on Nov 20, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



I'm seeing some hits today from 40.88.21.235 (Microsoft) where the UA is:

Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)

A short cluster of about 6 page requests happening within the space of 1 or 2 seconds, requesting my landing page and favicon. The strange thing is that some of these hits have non-null cookie string in the request header. The string begins with "handl_url_base=http://" and then goes on to include a url that I'm not familiar with (a private wealth advisory firm in Georgia). The string is quite long and includes this URL several times.

Anyone ever seen this before?

lucy24

5:54 am on Nov 21, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to logged headers ::

I'll be darned. I honestly can't remember if I have noticed this before, but around 1/5 of DDG-Favicons requests include a cookie. And about 3/4 of those send two cookies, the other one being “Cookie2” which earns them a 403. (No idea what Cookie2 is supposed to be for, but I've only seen it from robots.) And yes, the Cookie: is always something utterly incomprehensible, probably no two alike. Various permutations of HandLtestblahblah seem to be popular. The ones that include an URL--there are not many--look as if they have been garbled together from some other source's headers.

For example, with URL exemplified (in spite of the dot com, this one seems to be a design school in Milan):
Cookie: handl_url_base=https%3A%2F%2Fexample.com%2F; handl_ref=https%3A%2F%2Fexample.com%2F; handl_landing_page=http%3A%2F%2Fexample.com%2F; handl_ip=20.191.45.212; handl_url=https%3A%2F%2Fexample.com%2F; handl_original_ref=http%3A%2F%2Fexample.com%2F; user_agent=Mozilla%2F5.0+%28compatible%3B+DuckDuckGo-Favicons-Bot%2F1.0%3B+%2Bhttp%3A%2F%2Fduckduckgo.com%29; organic_source=http%3A%2F%2Fexample.com%2F; organic_source_str=Internal

which unpacks to (with artificial line breaks inserted)
handl_url_base=https://example.com/;
handl_ref=https://example.com/;
handl_landing_page=http://example.com/;
handl_ip=20.191.45.212;
handl_url=https://example.com/;
handl_original_ref=http://example.com/;
user_agent=Mozilla/5.0+(compatible;+DuckDuckGo-Favicons-Bot/1.0;++http://duckduckgo.com);
organic_source=http://example.com/;
organic_source_str=Internal
The handl_ip is the same as the actual IP of the request--often but not always 20.191.etcetera.

But I do find one interesting* exception, involving two different URLs--a dot com and a dot ru--plus google:
handl_ref=https://example.ru/;
handl_url=https://example.com/;
handl_ip=66.249.66.43;
user_agent=Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html);
organic_source=https://example.ru/;
organic_source_str=Other;
handl_original_ref=https://example.ru/;
handl_landing_page=https://example.com/


* But then, I find everything about weird robotic behavior interesting.