Forum Moderators: open

Message Too Old, No Replies

PiplBot

         

keyplyr

6:38 pm on Sep 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




UA: Mozilla/5.0+(compatible;+PiplBot;+http://www.pipl.com/bot/)/Nutch-1.14-SNAPSHOT
Protocol: HTTP/1.0
Robots.txt: No
Host: softlayer.com
169.56.0.0 - 169.63.255.255
169.56.0.0/13

Related discussion: [webmasterworld.com...]

lucy24

7:19 pm on Sep 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Robots.txt: No
The reason you're not seeing it asking for robots.txt is that it has divided its requests: On my site (not the one I was posting about in 2011) it asks only for robots.txt--a total of four times this month--while on your site it asks only for, well, whatever it is it's looking for. In fact I just double-checked my shared robots.txt to see if it was matching against some other string. (The Nutch element of the UA is new this year, explaining the new-found robots.txt appetite.)

Prior to this month, the last time I set eyes on it was in late 2013. At that time (2011-2013) it divided its efforts between images and the favicon.
:: log search running in background ::
No, wait, I'm wrong. There were lone favicon requests in 2014 and 2015 (personal site). Is it one of those DDG deals where they use someone else's crawl data, but fetch your favicon for the SERP?

keyplyr

7:49 pm on Sep 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is it one of those DDG deals where they use someone else's crawl data, but fetch your favicon for the SERP?
I'm thinking not since it is using Nutch.

keyplyr

8:41 pm on Oct 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Same range but new UA: Mozilla/5.0+(compatible;+PiplBot;+http://www.pipl.com/bot/)

lucy24

10:36 pm on Nov 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Incidentally, I've always seen it from two different Softlayer IPs in random alternation: 169.48 and 169.60. I checked; there's some non-Softlayer in between the two.

The Nutch version showed itself on one site in September-October. The shorter, non-Nutch version showed up more recently in what is now my personal site (formerly my only site).

Quirks: The former PiplBot--the one that asked for everything except robots.txt--last showed its face in mid-2015. But even then, only on my old site. So it's not just racing through all registered domain names, which it could perfectly well do in ARIN. To date, it has only requested robots.txt on HTTP. (I stopped redirecting robots.txt before PiplBot first showed up on the HTTPS site, so I don't know how it would have behaved.) One of these days I'll remove the Disallow and see what happens. But it's entirely possible that it has no interest in anything but robots.txt anyway.

keyplyr

1:41 am on Nov 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Pipl...
Pipl who need pipl,
Are the luckiest pipl in the world
"

wilderness

11:29 pm on Nov 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Same range but new UA: Mozilla/5.0+(compatible;+PiplBot;+http://www.pipl.com/bot/)"

These thing been hammering 9eating 403s) me for about a week from the two Class B's previously mentioned.
Added to robots.txt and it ceased immediately, however continues to request robots.