Forum Moderators: open

Message Too Old, No Replies

AspiegelBot

         

lucy24

7:41 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: mostly
Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; AspiegelBot)
rarely
Mozilla/5.0 (compatible;AspiegelBot)
(if this occurred in a book I was proofreading, there would be a notation [**sic spacing])

IP: 114.119.160-166 (various to date)
The sector 114.119.128-191 is Huawei Singapore, lurking in the middle of 114.112-119 which is otherwise all China.

robots.txt: appears to be compliant (Yay! The only thing better than a blocked request is one that isn’t made at all.)

Requests (before robots.txt Deny): mostly images, the occasional internal page)

History: showed up in early March, and became exceedingly active in late March.

iamlost

8:39 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The maniac AspiegelBot is crawling and indexing for Huawei’s currently in beta search engine app.

Huawei (Chinese telecom, smartphone giant), having been banned in Boston (and the rest of the US including Google services) is developing its own operating system, it’s own search engine and other apps (Huawei Mobile Service - HMS) infrastructure.
Note: Huawei has pledged to open source its OS - a direct shot across Androids bow.
Note: working with TomTom for a Google Maps replacement.
Note: had/having talks with Yandex and others about search.
Note: HMS has identity authentication, mobile wallet, music app, NFC (near-field communication), QR (Quick Response) code extraction, video streaming....more every week.

It’s search engine app is in beta testing - currently basic search with image, news, and video filters, plus shortcuts for calculator, conversions, sports results, and weather.

Which brings us to AspiegelBot.

Huawei’s search service is operated by Aspiegel Limited, a wholly owned subsidiary based in Ireland; set up in 2015 to be Huawei to the world.

lucy24

10:19 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, I don't see anything in all that to inspire me to revoke the Disallow ;)

iamlost

11:41 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It (hopefully) will be interesting...
On the one hand is ‘evil’ Google...
On the other hand is ‘evil‘ Huawei...
Cue the Good, the Bad, and the Ugly, put on the popcorn, and see what happens...

Which has nothing to do with whether allowing their crawler access is a potential business advantage.

Let the games begin!

tangor

12:25 am on Apr 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just new here ... but well-behaved, so not on the deny list ...

Yet.

notriddle

5:35 am on Sep 6, 2020 (gmt 0)

5+ Year Member



Well-behaved in the sense that it follows robots.txt, maybe.

Not so well-behaved in other respects. I'm getting requests from aspiegel (which I have verified with reverse DNS) with `%23` in them. Fragment identifiers are hardly an obscure corner of the HTML specification. Even Wikipedia uses them!

tangor

10:44 am on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I take it all back. Abusive and dangerous for log reporting and gathers 403s in the THOUSANDS each month.

Range is 114.119.xxx.xxx which "human appearing (string) aspiegel.com/petalbot

Internet noise.

lucy24

5:57 pm on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



gathers 403s in the THOUSANDS
How odd. Within this calendar year I've not seen a single request for anything but robots.txt. In fact they haven't shown their faces at all since July; instead there is the newer name (same IP) which started in May, so they overlapped for a couple of months:

(compatible;PetalBot;+https://aspiegel.com/petalbot)
[sic]

I repeat: sic. In a different venue I would say [**text unchanged].

tangor

7:10 pm on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is the UA hitting an 800 page site over 1900 times since the 1st...

Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+http://aspiegel.com/petalbot)

lucy24

7:45 pm on Sep 6, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This prompted me to re-check logs. Turns out there was, in a fairly literal sense, a clean break: the full form
Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://aspiegel.com/petalbot)

was in use through 23 June on multiple sites; from 24 June on, there’s nothing but the incomplete form.

Look very carefully and you will notice that, in addition to chopping off the first half of the UA, they also omitted the space after “compatible;”

Edit: I do not care to speculate why they are using different UAs on different sites. Or rather, different people's sites, because I see the identical behavior on three sites (same server).

tangor

5:31 am on Sep 7, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heh ... could it be geographic? I get BOTH versions and they HAMMER. What makes me so popular?

403 is your best friend. :)

tangor

12:24 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update: The hammering won't stop, so today I put that IP range in the firewall. As they say, "there's always a way..."

I hate it when bots can't take an UNSUBTLE hint...

lucy24

2:35 am on Sep 10, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Very mysterious that they appear to honor robots.txt on my sites only.

:: shrug ::