Forum Moderators: open

Message Too Old, No Replies

PerplexityBot

Perplexing

         

Pfui

2:39 am on Aug 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://docs.perplexity.ai/docs/perplexity-bot)
robots.txt: NO

Docs say it reads robots.txt, is Disallowable, etc. Nope. Not a single request out of three hits from different IPs (two of which were PSINet/Cogent).

Prior reference in Alternative Search Engines from Jan., 2023: [webmasterworld.com...]

lucy24

5:47 am on Aug 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huh. I see them from a wide range of IPs, generally requesting robots.txt every second or third visit. In fact it only recently occurred to me that I've no idea if they're compliant or not--no header transgressions, so for the time being they get what they ask for. Within the last week I’ve added a Disallow and am watching to see what happens.

Going by their page requests, they must be following links from elsewhere, since it's just individual inner pages, nothing near a full spidering.

Thanks to that wide range of IPs, I don’t know if all of them are actually the same robot following the same rules.

dstiles

7:55 am on Aug 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I didn't like what I read about them so am blocking them. I'm very wary of so-called AI, which may be artificial but is certainly not intelligent, merely clever.

lucy24

8:31 pm on Aug 11, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



for what it’s worth: while processing the last few days’ logs, I found a lone hotlink
{human IP} - - [10/Aug/2024:07:35:59 -0700] "GET /ebooks/directory/images/filename.gif HTTP/2.0" 200 3908 "https://www.perplexity.ai/" "{human UA}"
It comes through as a hotlink because the referer isn’t on the relatively short list of recognized search engines and friendly sites, so the requester gets only a garish NO HOTLINKS image.

This prompted me to look back, and turns out it isn’t lone at all. Since the beginning of the year, there have been dozens of image requests from assorted IPs and UAs, all giving perplexity.ai as referer. (I hope, btw, that Anguilla is making them pay through the nose.) Hmm.

During those same last-few-days, PerplexityBot has not requested anything but robots.txt. But it’s early days yet.

Pfui

6:39 am on Aug 12, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Weekend recap FWIW:
Four different, non-REF html hits from four different IPs. NONE requested robots.txt.
Courtesy of:
Syn Ltd, NL; 232Web, UK; Aventice, US; HostRoyale, India. Latter two host a plethora of chronic slow crawlers.