Does your robots.txt name each major AI crawler with an explicit policy?
Naming each AI bot by name in robots.txt removes ambiguity and signals a deliberate, modern crawl policy.
What this signal tests
We check whether your robots.txt file names the major AI crawlers individually - OpenAI's GPTBot, Anthropic's ClaudeBot, Google-Extended, PerplexityBot, Applebot-Extended, Meta's crawler, ByteDance's Bytespider, Amazonbot, Bingbot, and others - with their own dedicated rule blocks, rather than relying on the catch-all wildcard rule to cover them.
Why it matters for your visibility in AI
Some AI crawlers respect only rules addressed to their specific name and ignore the wildcard rule that applies to all bots. If your policy lives only under the wildcard, certain crawlers may behave unpredictably - either crawling pages you intended to block, or skipping pages you intended to allow, with no warning. Naming each AI vendor explicitly is also a signal of maturity. It tells AI companies that your site takes the policy seriously, which improves your standing in their crawl prioritisation. Sites with vague wildcard-only rules are increasingly treated as low-confidence sources, and their content is cited less often than sites with clear, vendor-specific policies.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| >=80% of listed AI UAs have intentional explicit rules. |
How we test it
We parse your robots.txt file and look for User-agent lines that name each of the major AI crawler tokens, such as GPTBot, ClaudeBot, Google-Extended, PerplexityBot, and so on. For each known AI bot, we check whether it has its own rule block with intentional Allow or Disallow directives, then calculate the percentage of major AI vendors covered explicitly.
Show technical detection method
Parse robots.txt; for each bot token, check for an explicit (non-wildcard) User-agent block; score % coverage.
If your site fails: how to fix it
- Decide your policy for each major AI vendor before editing: do you want to allow training crawlers, only allow search-purpose crawlers, or block specific vendors? Write your policy down before touching the file.
- Open your robots.txt and add a User-agent block for each major AI bot you care about: GPTBot, OAI-SearchBot, ChatGPT-User, Google-Extended, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, CCBot, Applebot-Extended, Meta-ExternalAgent, Bytespider, Amazonbot, Bingbot.
- Under each User-agent line add the Allow or Disallow directives that match your policy. Group related bots together - for example all training crawlers - to keep the file maintainable.
- Reference the community-maintained list at github.com/ai-robots-txt/ai.robots.txt to make sure you have not missed any newly active bots.
- Keep a comment near each block explaining why you chose that policy, so future maintainers do not undo it by accident.
- Re-run the AI Ready Test to confirm coverage now meets the threshold.
Quick facts
| Maturity | ESTABLISHED |
|---|---|
| Weight | high |
| Category | Crawlability |
Primary sources
- https://platform.openai.com/docs/bots
- https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
- https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers
- https://support.apple.com/en-us/119829
- https://docs.perplexity.ai/guides/bots
- https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
- https://commoncrawl.org/ccbot
Related signals
Frequently asked questions
Why not just use a single wildcard rule for everything?
Because not every AI crawler honours the wildcard rule the same way, and some intentionally ignore it. Explicit per-vendor rules are the only reliable way to communicate policy to each crawler, and they also make your intent auditable by humans reading the file.
Do I need a rule for every bot in the world?
No. We only check for the major AI vendors that have visible market share or known training corpora. Aim for explicit coverage of those, and use a sensible wildcard fallback for the long tail of niche crawlers.
What happens if I block a bot that I previously allowed?
Compliant bots will stop crawling on their next visit, usually within a day or two. Content already ingested into models or search indexes will remain until the next refresh - there is no instant removal - but no new pages will be ingested.
Is there a downside to blocking all AI bots?
Yes - your site will not appear in ChatGPT Search, Perplexity, Claude, Bing Copilot, or Google AI Overview citations. For most businesses this is a meaningful loss of visibility. Block training bots if you object to model training, but consider allowing search-purpose bots.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.