Welcome to WebmasterWorld Guest from 35.175.180.108

Forum Moderators: Ocean10000

Message Too Old, No Replies

Cliqzbot

     
3:10 am on Mar 23, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 26, 2006
posts: 1650
votes: 3


35.158.162.xxx (Germany) 200 - OKNo ReferrerMozilla/5.0 (compatible; Cliqzbot/2.0; http://cliqz.com/company/cliqzbot)



- - -

[edited by: keyplyr at 3:14 am (utc) on Mar 23, 2018]
[edit reason] delinked URL [/edit]

3:15 am on Mar 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Been around for a while. Looks like they've changed their UA attributes though. Are you posting the exact UA?

Is this you or them...?
OKNo Referrer
Please only post exact log entry, otherwise it is irrelevant for documentation.

Archived:
[webmasterworld.com...]
[webmasterworld.com...]

I think cliqz.com is a beneficial search index. Since it is browser based, there is no referrer so it is difficult to see if allowing the UA brings traffic, but I have been allowing it for over a year without seeing any negative affect AFAIK.
7:46 am on Mar 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


For comparison purposes here's what I find most recently (they're on my Ignore list, so I had to check):

IP: distributed-- 18, 34, 35, et cetera, clearly no longer only from 54 as in years past
UA: Mozilla/5.0 (compatible; Cliqzbot/2.0; +http://cliqz.com/company/cliqzbot)

“200 - OKNo Referrer” looks like a cut-and-paste from logs.

Incidentally, the URL in the UA redirects--tut, tut--to
https://cliqz.com/cliqzbot
(Hee, hee, yet another HTTPS migration, among other things.) The page defaults to German, with a slightly pecular language-switching arrangement where you have to mouse-over the German flag in order to get the UK flag ... in a different place. And vice versa to change back. I must have been there before, because I remember the “Was genau ist Cliqzbot?”
8:02 am on Mar 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


“200 - OKNo Referrer” looks like a cut-and-paste from logs.
Well I've never seen that in the UA string unless you're bemusing logs with some software report. As said above, I allow this bot and see it a lot.
10:45 am on Mar 23, 2018 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3286
votes: 19


I think one of the security test sites use Cliqzbot - at least, that's why I allow it.
8:03 pm on Mar 23, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


I've never seen that in the UA string
That's what I meant. It's three consecutive pieces: response, referer, UA. So probably not literal raw logs but some tabular output, probably with the intervening tabs (\t character) disappearing when pasting into the Post form.

Which is why I always, always Preview.

... And generally have to edit again after posting.
2:02 am on Apr 6, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1997
votes: 75


Interesting part is that in cases on my sites it also has Accept-Language header as pl-PL, pl;q=0.9, which is for Polish language.
3:20 am on Apr 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


Are your sites in Poland, and/or in Polish?

It hadn't previously occurred to me to look at the language headers. With me it's mostly
en-US, en;q=0.8, es-US;q=0.5
(I like “es-US”) but on some days for variety's sake--including but not limited to material on Kalaallisut and related topics--it's
de-DE, de;q=0.9, dsb-DE;q=0.3, hsb-DE;q=0.2
(Since when is Hue-Saturation-Brightness a language?)

:: further detour to lookup [loc.gov] ::
dsb = Lower Sorbian (bas-sorabe, Niedersorbisch)
hsb = Upper Sorbian (haut-sorabe, Hochsorbisch)
... which raises more questions than it answers.

:: wandering off to Omniglot [omniglot.com] ::

Sorbian, or Wendisch, is a member of the West Slavic subgroup of Indo-European languges spoken by about 55,000 people in Upper and Lower Lusatia in the German Länder of Saxony and Brandenburg. The Sorbs are descendents of the Wends, the German name for the Slavic tribes who occupied the area between the Elbe and Saale rivers in the west and the Odra (Oder) River in the east during the medieval period.

Finally, on a handful of visits--apparently limited to early texts ranging from Aelfric to the Pastons--they've used
fr-FR, fr;q=0.9, br-FR;q=0.7, gsw-FR;q=0.5, co-FR;q=0.3
br is presumably Breton
gsw = Swiss German (er... “Swiss German as spoken in France”?) aka Alsatian, which makes more sense
co = Corsican

:: further business with Omniglot ::

Corsican has no official status in Corsica
In all cases, the robots.txt request matches the language headers of the immediately following page requests.

Poor Cliqzbot! Maybe they just can't figure out what language my site is in :)
1:40 pm on Apr 6, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1997
votes: 75


The sites in question are in USA, they do attract traffic from Poland though.

The latest request was from 94.198.99.XX which is an IP under Italian Server Farm SEFLOW-MGMT.

This one had Accept-Language: it-IT, it;q=0.9.

One thing I found interesting was on their About Page(cliqz.com/en/about):
In February 2017, Cliqz acquired the world’s leading anti-tracking tool Ghostery.

So, could it be a part or pre-fetch from Ghostery?
7:06 pm on Apr 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


could it be a part or pre-fetch from Ghostery
Hm, that's an idea. But then you'd expect language headers varying all over the map, depending on their most recent human request. It definitely isn't a “pre-fetch” in the sense of something followed immediately afterward by a human request.

Hmm. Is there a CliqzDude in the house?
10:36 pm on Apr 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Not sure what the mystery is here.

Again...
Cliqz offers products for searching directly in the browser and runs a self-developed search technology. Cliqzbot collects URLs and website content in the Cliqz index.
source: [cliqz.com...]
1:16 am on Apr 7, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


The mystery is why they send such weird Accept-Language headers. Have you ever, in your entire life, even met someone whose first language is Sorbian-with-an-O? And I don't believe even leosghost has Breton on his language list. Out of hundreds of logged headers from this User-Agent (I don't keep them around forever, but there are more than enough to generalize), I find lots of {pattern A}, lots of {pattern B}, and a handful of {pattern C}.