Tick a bot to block it. The popular choice is to block the training crawlers while letting the AI search and assistant bots through, so you keep AI referral traffic without feeding the models. Pick a preset to start, then fine-tune.
Your robots.txt
Add these rules to the robots.txt at the root of your domain
(https://example.com/robots.txt). If you already have one, merge these
User-agent blocks into it.
Optional: noai meta tag
A per-page signal some platforms honour. Support is limited and the big AI crawlers use robots.txt instead, so treat this as a supplement, not a replacement.
<!-- In your <head> -->
<meta name="robots" content="noai, noimageai">
<!-- Or as an HTTP response header -->
X-Robots-Tag: noai, noimageai Training, search, and user fetches are different bots
The big AI companies now run separate crawlers for different jobs, each with its
own name in robots.txt. OpenAI's GPTBot gathers training data, OAI-SearchBot
builds the ChatGPT search index, and ChatGPT-User only fetches a page when a person
asks ChatGPT to read it. Anthropic splits the same way into ClaudeBot,
Claude-SearchBot and Claude-User. Because they are independent, you can
block training while still appearing in AI search results and answering direct user requests.
Opt-out tokens are not crawlers
Google-Extended and Applebot-Extended do not fetch anything. They are
signals you place in robots.txt to opt out of training. Crucially, blocking
Google-Extended stops Google using your content to train Gemini but does
not affect your normal Google Search ranking, which is handled by Googlebot.
Pairs with the UTM Builder and the QR Code Generator.