Forum Moderators: open

Message Too Old, No Replies

ChatGPT / openai bot

First time I'm seeing them

         

SumGuy

11:57 pm on Aug 9, 2024 (gmt 0)

5+ Year Member Top Contributors Of The Month



Today I've seen a hit to robots.txt and one of my interior html pages from this:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

It came from this Micro$haft IP: 40.84.221.231 - A completely anonymous MSFT IP, not part of any openai range. No reverse DNS host name. Why doesn't openai have the balls to operate their bots (or this bot) from their own assigned IP's and hosts? A log scan turns up 5 previous openai bot hits on 3 days: May 17, June 8 and July 27. From 40.84.180.0/24 and 52.230.152.155 (all MSFT). One of those previous hits had this UA:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Any ideas what ChatGPT-User vs GPTBot might indicate? Perhaps the User instance is a page (or in my case, a PDF file) request based on an individual user query to ChatGPT? I'm not going to block this - for now. I'll see how this sort of traffic develops.

I'm blocking quite a lot of Microsoft IP's (by entire /16 subnets) for one reason or another (past abusive behavior - port scans, bad http requests, email spam) so they're obviously choosing to use IP's that have not been tainted by malware use and have a clean reputation. Naturally I keep track of and allow specific MSFT ranges for bing-bot and outlook email.

not2easy

3:12 am on Aug 10, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I do not know whether it is still thew same but when they started sharing crawl infor, it was given as Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

UAs for few other AI bots can be found here: [webmasterworld.com...]

dstiles

7:51 am on Aug 10, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I found the openai with a different UA and several IPs in the range 20.42.10.0/24 (no appropriate rDNS).

User-Agent:Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

The URL leads to: [platform.openai.com...] which shows all three UAs and their IPs.

I do not as yet trust the GPTs but yesterday allowed OpenSearch, which has since visited half a dozen pages of one site.