Restricting GPTBot To Crawl Your Website

OpenAI’s newest tool, GPTBot, is created to improve the data used to train their AI models, such as the soon-to-be-released GPT-5. GPTBot scours the web for information that might enhance the reliability, efficiency, and security of AI systems.

In layman’s terms, this bot is doom scrolling. Even if AI doesn’t have complete self-awareness just yet, creators are understandably feeling apprehensive about the future because of this doom scrolling.

The bots at ChatGPT, or at least the higher-ups in charge of the program, appear to be paying attention to this. Every publisher who cares about data privacy and transparency should celebrate the fact that you may now choose not to share your site’s data with OpenAI.

Disabling GPTBot’s Web Crawling

In order to stop GPTBot from scrolling your content, you may require some additional help like a web developer. Although Advergic’s Support would be delighted to assist you to set up counter measures, they are unable to alter your robots.txt files.

As debates over copyright and fair use continue to rage across the world, OpenAI is aiming for a model in its latest release that would inquire as to whether or not you would mind sharing your content without credit or links.

Their approval is based on assumptions, but the important thing is that a solution is available. Prior to moving on, keep in mind that disabling GPTBot web crawling may affect your traffic, especially from Bing Chat.

We are not sure but this impact is possible and we will keep you updated if it’s really happening. Still, we’re pleased that publishers have this choice. The following procedures will stop GPTBot from accessing your website and training its models with your content:

Update Your Robots.txt File

Making changes to your website’s robots.txt file is the initial step in disabling GPTBot web crawling. The robots.txt file specifies which sections of your site are accessible to web crawlers like GPTBot and which sections are off-limits.

To completely block GPTBot’s access to your website, add the following commands to your robot.txt file:

User-agent: GPTBot

Disallow: /

You can modify this to partially block the crawling to suit your needs; 

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

The code shown above is only an example. The names of your directories or the places that GBTBot cannot access can be different. If you own an online store, you might wish to restrict GBTBot’s access to your products but let it crawl your content.

Save and Upload

Make any required edits to robots.txt, then save and publish the file to your website’s root directory. This will make sure that GPTBot learns your preferred methods of access and modifies its crawling actions appropriately.

Check Your Modifications

You can utilize online tools and services that examine your website’s robots.txt file to verify that GPTBot is following your access preferences. Make sure your changes are updated so GPTBot cannot access your website anymore.

Gaining Control Over Your Content

Can you make these modifications without the help of your site’s administrator or host? Very possibly. Perhaps you’re competent enough to handle this on your own. This is not something that our support can help you with, but we are sure that by following these steps you or your developer can surely make the changes to protect your content.

AI has been around for some time. It will persist for the foreseeable future. You can be confident that we are actively participating in discussions and collaborating with other prominent figures in the industry to stand by you, our publishers, as the AI, copyright, and data usage issue continues to heat up.

We’re keeping an eye out for ways to assist safeguard your content. Your material will always be yours, and we’re dedicated to making that happen.

