Skip to main content

AI Crawler Control

Tick a bot to block it. The popular choice is to block the training crawlers while letting the AI search and assistant bots through, so you keep AI referral traffic without feeding the models. Pick a preset to start, then fine-tune.

Training crawlers

Scrape pages to train AI models. Blocking these is the usual “don’t train on my content” choice.

Search & retrieval

Index pages so the assistant can cite them. Allowing these can send referral traffic back to you.

User-triggered fetches

Only fetch a page when a person explicitly asks the assistant to read that URL.

Undeclared / unverified

No official documentation and may ignore robots.txt. A rule here is best-effort; use a firewall to truly block.

Your robots.txt

Add these rules to the robots.txt at the root of your domain (https://example.com/robots.txt). If you already have one, merge these User-agent blocks into it.

Optional: noai meta tag

A per-page signal some platforms honour. Support is limited and the big AI crawlers use robots.txt instead, so treat this as a supplement, not a replacement.

<!-- In your <head> -->
<meta name="robots" content="noai, noimageai">

<!-- Or as an HTTP response header -->
X-Robots-Tag: noai, noimageai
robots.txt is voluntary. Reputable crawlers honour it, but some (for example Bytespider, and Perplexity in disputed cases) have been reported ignoring it. To enforce a block you need a firewall or WAF rule. Note too that Google's AI Overviews are served by Googlebot, so this file cannot selectively block them. Google is rolling out a separate opt-out in Search Console (UK first, required by the CMA, with a wider rollout to follow) that removes you from AI Overviews and AI Mode while keeping your normal Search ranking.

Training, search, and user fetches are different bots

The big AI companies now run separate crawlers for different jobs, each with its own name in robots.txt. OpenAI's GPTBot gathers training data, OAI-SearchBot builds the ChatGPT search index, and ChatGPT-User only fetches a page when a person asks ChatGPT to read it. Anthropic splits the same way into ClaudeBot, Claude-SearchBot and Claude-User. Because they are independent, you can block training while still appearing in AI search results and answering direct user requests.

Opt-out tokens are not crawlers

Google-Extended and Applebot-Extended do not fetch anything. They are signals you place in robots.txt to opt out of training. Crucially, blocking Google-Extended stops Google using your content to train Gemini but does not affect your normal Google Search ranking, which is handled by Googlebot.

Pairs with the UTM Builder and the QR Code Generator.