Forum Moderators: open

Message Too Old, No Replies

OpenAI Crawlers and Bots

         

Brett_Tabke

10:46 am on Aug 21, 2024 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Source: OpenAI

Overview of OpenAI Crawlers


OpenAI uses web crawlers (“robots”) and user agents to perform actions for its products, either automatically or triggered by user request. OpenAI uses the following robots.txt tags to enable webmasters to manage how their sites and content work with AI. Each setting is independent of the others – for example, a webmaster can allow OAI-SearchBot to appear in search results while disallowing GPTbot to indicate that crawled content should not be used for training OpenAI's generative AI foundation models. For search results, please note it can take ~24 hours from a site's robots.txt update for our systems to adjust.

OAI-SearchBot


    OAI-SearchBot is for search. OAI-SearchBot is used to link to and surface websites in search results in the SearchGPT prototype. It is not used to crawl content to train OpenAI's generative AI foundation models. To help ensure your site appears in search results, we recommend allowing OAI-Searchbot in your site's robots.txt file and allowing requests from our published IP ranges below.

    Full user-agent string:

    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

    Published IP addresses:

ChatGPT-User

    ChatGPT-User is for user actions in ChatGPT and Custom GPTs. When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer and include a link to the source in its response. ChatGPT users may also interact with external applications via GPT Actions. ChatGPT-User governs which sites these user requests can be made to. It is not used for crawling the web in any automatic fashion, nor to crawl content for generative AI training.

    Full user-agent string:


    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
    Published IP addresses

GPTBot

    GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models. Disallowing GPTBot indicates a site's content should not be used in training generative AI foundation models.

    Full user-agent string:


    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot
    Published IP addresses

Bewenched

7:58 pm on Mar 14, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Saw reference to using a file like robots.txt but for ai,
this site helps you create one, but I dont know if any of the AI bots are actually pulling it.

[site.spawning.ai...]

It's free, but remove the link if this isn't allowed

This is supposed to work too in your header
<meta name="robots" content="noai, noimageai">