AI Crawler Control

Save choices in browser

Tick a bot to block it. The popular choice is to block the training crawlers while letting the AI search and assistant bots through, so you keep AI referral traffic without feeding the models. Pick a preset to start, then fine-tune.

Your robots.txt

Write explicit Allow lines for the bots I permit (documents intent; otherwise they are allowed by default)

Add these rules to the robots.txt at the root of your domain (https://example.com/robots.txt). If you already have one, merge these User-agent blocks into it.

Optional: noai meta tag

A per-page signal some platforms honour. Support is limited and the big AI crawlers use robots.txt instead, so treat this as a supplement, not a replacement.

<!-- In your <head> -->
<meta name="robots" content="noai, noimageai">

<!-- Or as an HTTP response header -->
X-Robots-Tag: noai, noimageai

robots.txt is voluntary. Reputable crawlers honour it, but some (for example Bytespider, and Perplexity in disputed cases) have been reported ignoring it. To enforce a block you need a firewall or WAF rule. Note too that Google's AI Overviews are served by Googlebot, so this file cannot selectively block them. Google is rolling out a separate opt-out in Search Console (UK first, required by the CMA, with a wider rollout to follow) that removes you from AI Overviews and AI Mode while keeping your normal Search ranking.

Training, search, and user fetches are different bots

The big AI companies now run separate crawlers for different jobs, each with its own name in robots.txt. OpenAI's GPTBot gathers training data, OAI-SearchBot builds the ChatGPT search index, and ChatGPT-User only fetches a page when a person asks ChatGPT to read it. Anthropic splits the same way into ClaudeBot, Claude-SearchBot and Claude-User. Because they are independent, you can block training while still appearing in AI search results and answering direct user requests.

Opt-out tokens are not crawlers

Google-Extended and Applebot-Extended do not fetch anything. They are signals you place in robots.txt to opt out of training. Crucially, blocking Google-Extended stops Google using your content to train Gemini but does not affect your normal Google Search ranking, which is handled by Googlebot.

Pairs with the UTM Builder and the QR Code Generator.