Forum Moderators: bakedjake

Message Too Old, No Replies

Petal Search (Huawei)

         

Peter_S

9:42 pm on Jul 20, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



Long time no see,

I came up identifying a new crawler (search engine) on my site, it might not be new, but it is to me, so I thought I could share.

Petal Search => https://www.petalsearch.com/

It's Huawei's search engine.

Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)


There is a previous discussion => [webmasterworld.com...]

lucy24

11:11 pm on Jul 20, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This prompted me to re-check logs. The version I see is exclusively
Mozilla/5.0 (compatible;PetalBot;+https://webmaster.petalsearch.com/site/petalbot)

That earlier thread says I last saw the Android version in June of 2020. They remain, I think, the only Chinese search engine that appears to honor robots.txt.

IP currently looks like 114.119.128.0/19 though in past years I've seen them further up in the /18, at least in the 160-167 area.

tangor

5:08 am on Jul 24, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PetalBot has NEVER honored my robots.txt, ripping through 1,000-1,500 files each month... eating 403s along the way. Same range as above.

brotherhood of LAN

12:14 pm on Jul 24, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Their web UI has been discontinued.

[facebook.com...]

But still seems to exist as an app.

Peter_S

1:44 pm on Jul 24, 2023 (gmt 0)

5+ Year Member Top Contributors Of The Month



This is odd that I am starting to see their bot NOW whereas they just discontinued their search engine (now using Bing & Yandex, form what I understand).

brotherhood of LAN

2:03 pm on Jul 24, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Peter_S Hauwei do have an agreement with Qwant so not sure if that's tied in with what you're seeing. Might be if they're French IPs.

lucy24

3:52 pm on Jul 24, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PetalBot has NEVER honored my robots.txt
Very mystifying. I just re-checked logs to confirm that they haven't been sneaking in with a different UA from the same IP. Even did some random spot-checking to see if their robots.txt request is immediately followed by any 403 from some recurring IP or UA.

In fact their robots.txt Disallow is in the minimalist form where I list a batch of UAs, winding up with a comprehensive Disallow. Not all robots even understand this form.

tangor

10:59 pm on Jul 27, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can only report what I see in the logs each month. Meanwhile, both UA and IP attributes are denied, though I still give them robots.txt!

RedBar

3:34 pm on Jan 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since the New Year I have been experiencing loads of Huawei single page visits and have blocked many IP ranges including their supposed main one 114.119.0.0/16. The quantity has reduced but some supposedly denied ones are still getting through.

Any suggestions or recommendations?

not2easy

7:36 pm on Jan 10, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Are these visits from the PetalBot? Are you blocking PetalBot UA?

RedBar

7:49 pm on Jan 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep, even recognised as PetalBot in my logs. So far I have denied all Huawei's Cloud UA from 114.119.128.0/18 through to 114.119.190.0/24 plus 114.119.0.0/16 which supposedly covers all the above plus 114.119.0.0/24. I have also in robts.txt Disallowe User-agent PetalBot.

I've deliberately avoid banning all Chinese IPs since I do a lot of business there even though they are using 114.119 IPs.

Anything else I should do or have I over done it? :-)

not2easy

9:11 pm on Jan 10, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Robots.txt doesn't do anything with non-compliant robots, consider it hopeful suggestions. If you are blocking 114.119.0.0/16 then the other 114.119.xxxx blocks don't do anything useful, might confuse the server?

Since this is the Alternative Search Engines forum, it is off topic for blocking bots (or we would have dozens of duplicate "how to" posts across the individual known robots. The same tactics work on all of them so the place to find out how is in the server environment forums like the Apache [webmasterworld.com] forum (for topics about .htaccess, mod_rewrite, and other Apache specific topics) or the Microsoft IIS Web Server and ASP.NET [webmasterworld.com] forum for IIS management and related topics.

You may also find typical blocking ideas and suggestions is the Crawler, Spider, and User Agent ID forum here: [webmasterworld.com...]

RedBar

2:26 pm on Jan 11, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks and fingers crossed. It's just gone 14.00 UK time and by 08.00 it had reduced to a trickle plus only 6 visits in the last 6 hours.