Crawlability - AI-readiness signals
All 19 signals in the Crawlability category, with what each tests and why it matters for visibility in AI.
-
Is your robots.txt file readable by ChatGPT and other AI crawlers?
A valid robots.txt at your site root tells AI crawlers what they can read and index.
-
Does your robots.txt point AI crawlers to your sitemap?
A Sitemap line in robots.txt is the front door that helps AI crawlers find every page on your site.
-
Is your XML sitemap valid and within the size limits AI crawlers expect?
A well-formed sitemap under fifty thousand URLs is the canonical map AI search bots use to discover your pages.
-
Does your sitemap honestly report when each page was last updated?
Accurate last-modified dates in your sitemap tell AI crawlers which pages are fresh enough to cite.
-
Does your robots.txt name each major AI crawler with an explicit policy?
Naming each AI bot by name in robots.txt removes ambiguity and signals a deliberate, modern crawl policy.
-
Are the AI search bots that cite your pages allowed to crawl your site?
AI search crawlers must reach your pages before they can ever cite you in answers from ChatGPT, Perplexity, or Bing.
-
Is your opt-out policy for AI training crawlers consistent across every major vendor?
If you opt out of AI training, the rule must apply to every training crawler - not just the famous ones.
-
Is your firewall or CDN silently blocking AI crawlers despite your robots.txt allowing them?
A bot policy in robots.txt means nothing if your edge firewall blocks the bot's IP address before it ever reaches your robots file.
-
Does every indexable page have exactly one clear canonical URL?
A single canonical tag per page tells AI crawlers which URL is the real one when several versions exist.
-
Do your language and country versions reference each other correctly?
Reciprocal hreflang annotations help AI assistants serve the right language version of your site to each user.
-
Is your entire site served securely over HTTPS with no insecure leftovers?
AI crawlers strongly prefer HTTPS sites and quietly downgrade trust in any page with mixed insecure resources.
-
Does your site tell browsers and crawlers to always use the secure HTTPS version?
An HSTS header locks browsers and crawlers onto HTTPS, preventing silent downgrades that fragment your AI visibility.
-
Are your redirects short, correctly typed, and easy for AI crawlers to follow?
Short, properly typed redirect chains keep AI crawlers from giving up before they reach your real page.
-
Does your site return honest HTTP status codes when something is missing or broken?
Returning a real 404 for missing pages - not a 200 with an error message - keeps your AI footprint clean and trustworthy.
-
Can AI crawlers read your main content without running any JavaScript?
Most AI crawlers do not run JavaScript, so your real content has to be visible in the raw HTML on first load.
-
Does your mobile site show the same content, metadata, and structured data as desktop?
AI Overviews crawl your mobile version first - any gaps between mobile and desktop become invisible content in AI answers.
-
Is your site reachable for crawlers connecting over the modern IPv6 internet?
Major crawlers run on dual-stack networks, and IPv6-only AI inference clusters are growing fast.
-
Do your robots meta tag and X-Robots-Tag header agree with each other?
Contradictory robots directives force AI crawlers to apply the strictest one and silently drop your pages.
-
Do your robots directives let AI assistants quote and preview your content?
Restrictive snippet directives prevent AI assistants from quoting your pages, even when your content would have been the best answer.