Today I've seen a hit to robots.txt and one of my interior html pages from this:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
It came from this Micro$haft IP: 40.84.221.231 - A completely anonymous MSFT IP, not part of any openai range. No reverse DNS host name. Why doesn't openai have the balls to operate their bots (or this bot) from their own assigned IP's and hosts? A log scan turns up 5 previous openai bot hits on 3 days: May 17, June 8 and July 27. From 40.84.180.0/24 and 52.230.152.155 (all MSFT). One of those previous hits had this UA:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
Any ideas what ChatGPT-User vs GPTBot might indicate? Perhaps the User instance is a page (or in my case, a PDF file) request based on an individual user query to ChatGPT? I'm not going to block this - for now. I'll see how this sort of traffic develops.
I'm blocking quite a lot of Microsoft IP's (by entire /16 subnets) for one reason or another (past abusive behavior - port scans, bad http requests, email spam) so they're obviously choosing to use IP's that have not been tainted by malware use and have a clean reputation. Naturally I keep track of and allow specific MSFT ranges for bing-bot and outlook email.