Forum Moderators: open

Message Too Old, No Replies

Cliqzbot

         

Bewenched

3:10 am on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



35.158.162.xxx (Germany) 200 - OKNo ReferrerMozilla/5.0 (compatible; Cliqzbot/2.0; http://cliqz.com/company/cliqzbot)



- - -

[edited by: keyplyr at 3:14 am (utc) on Mar 23, 2018]
[edit reason] delinked URL [/edit]

keyplyr

3:15 am on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been around for a while. Looks like they've changed their UA attributes though. Are you posting the exact UA?

Is this you or them...?
OKNo Referrer
Please only post exact log entry, otherwise it is irrelevant for documentation.

Archived:
[webmasterworld.com...]
[webmasterworld.com...]

I think cliqz.com is a beneficial search index. Since it is browser based, there is no referrer so it is difficult to see if allowing the UA brings traffic, but I have been allowing it for over a year without seeing any negative affect AFAIK.

lucy24

7:46 am on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For comparison purposes here's what I find most recently (they're on my Ignore list, so I had to check):

IP: distributed-- 18, 34, 35, et cetera, clearly no longer only from 54 as in years past
UA: Mozilla/5.0 (compatible; Cliqzbot/2.0; +http://cliqz.com/company/cliqzbot)

“200 - OKNo Referrer” looks like a cut-and-paste from logs.

Incidentally, the URL in the UA redirects--tut, tut--to
https://cliqz.com/cliqzbot
(Hee, hee, yet another HTTPS migration, among other things.) The page defaults to German, with a slightly pecular language-switching arrangement where you have to mouse-over the German flag in order to get the UK flag ... in a different place. And vice versa to change back. I must have been there before, because I remember the “Was genau ist Cliqzbot?”

keyplyr

8:02 am on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



“200 - OKNo Referrer” looks like a cut-and-paste from logs.
Well I've never seen that in the UA string unless you're bemusing logs with some software report. As said above, I allow this bot and see it a lot.

dstiles

10:45 am on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think one of the security test sites use Cliqzbot - at least, that's why I allow it.

lucy24

8:03 pm on Mar 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen that in the UA string
That's what I meant. It's three consecutive pieces: response, referer, UA. So probably not literal raw logs but some tabular output, probably with the intervening tabs (\t character) disappearing when pasting into the Post form.

Which is why I always, always Preview.

... And generally have to edit again after posting.

blend27

2:02 am on Apr 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting part is that in cases on my sites it also has Accept-Language header as pl-PL, pl;q=0.9, which is for Polish language.

lucy24

3:20 am on Apr 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are your sites in Poland, and/or in Polish?

It hadn't previously occurred to me to look at the language headers. With me it's mostly
en-US, en;q=0.8, es-US;q=0.5
(I like “es-US”) but on some days for variety's sake--including but not limited to material on Kalaallisut and related topics--it's
de-DE, de;q=0.9, dsb-DE;q=0.3, hsb-DE;q=0.2
(Since when is Hue-Saturation-Brightness a language?)

:: further detour to lookup [loc.gov] ::
dsb = Lower Sorbian (bas-sorabe, Niedersorbisch)
hsb = Upper Sorbian (haut-sorabe, Hochsorbisch)
... which raises more questions than it answers.

:: wandering off to Omniglot [omniglot.com] ::

Sorbian, or Wendisch, is a member of the West Slavic subgroup of Indo-European languges spoken by about 55,000 people in Upper and Lower Lusatia in the German Länder of Saxony and Brandenburg. The Sorbs are descendents of the Wends, the German name for the Slavic tribes who occupied the area between the Elbe and Saale rivers in the west and the Odra (Oder) River in the east during the medieval period.

Finally, on a handful of visits--apparently limited to early texts ranging from Aelfric to the Pastons--they've used
fr-FR, fr;q=0.9, br-FR;q=0.7, gsw-FR;q=0.5, co-FR;q=0.3
br is presumably Breton
gsw = Swiss German (er... “Swiss German as spoken in France”?) aka Alsatian, which makes more sense
co = Corsican

:: further business with Omniglot ::

Corsican has no official status in Corsica
In all cases, the robots.txt request matches the language headers of the immediately following page requests.

Poor Cliqzbot! Maybe they just can't figure out what language my site is in :)

blend27

1:40 pm on Apr 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The sites in question are in USA, they do attract traffic from Poland though.

The latest request was from 94.198.99.XX which is an IP under Italian Server Farm SEFLOW-MGMT.

This one had Accept-Language: it-IT, it;q=0.9.

One thing I found interesting was on their About Page(cliqz.com/en/about):
In February 2017, Cliqz acquired the world’s leading anti-tracking tool Ghostery.

So, could it be a part or pre-fetch from Ghostery?

lucy24

7:06 pm on Apr 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



could it be a part or pre-fetch from Ghostery
Hm, that's an idea. But then you'd expect language headers varying all over the map, depending on their most recent human request. It definitely isn't a “pre-fetch” in the sense of something followed immediately afterward by a human request.

Hmm. Is there a CliqzDude in the house?

keyplyr

10:36 pm on Apr 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not sure what the mystery is here.

Again...
Cliqz offers products for searching directly in the browser and runs a self-developed search technology. Cliqzbot collects URLs and website content in the Cliqz index.
source: [cliqz.com...]

lucy24

1:16 am on Apr 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The mystery is why they send such weird Accept-Language headers. Have you ever, in your entire life, even met someone whose first language is Sorbian-with-an-O? And I don't believe even leosghost has Breton on his language list. Out of hundreds of logged headers from this User-Agent (I don't keep them around forever, but there are more than enough to generalize), I find lots of {pattern A}, lots of {pattern B}, and a handful of {pattern C}.