Is your XML sitemap valid and within the size limits AI crawlers expect?
A well-formed sitemap under fifty thousand URLs is the canonical map AI search bots use to discover your pages.
What this signal tests
We fetch each sitemap referenced by your site and check that it is a properly formed XML document using the standard sitemaps.org format. We verify it stays under the published size limits - no more than fifty thousand URLs and no more than fifty megabytes uncompressed - and that every page URL inside is a full absolute address rather than a relative path.
Why it matters for your visibility in AI
A broken or oversized sitemap is silently skipped by AI search crawlers like OAI-SearchBot, PerplexityBot, Claude-SearchBot, and Bingbot. When that happens, only your most heavily linked pages get discovered, and the rest of your content becomes invisible to AI answers - even though it exists and is otherwise crawlable. For large sites this is especially costly: a site with twenty thousand product pages that ships one malformed sitemap loses tens of thousands of potential AI citations. Your competitors with clean, well-split sitemaps get full coverage, while your catalogue is partially or entirely missing from the answers customers see in ChatGPT, Perplexity, and Bing Copilot.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| All sitemaps parse + under both limits + 100% absolute URLs. |
How we test it
We download each sitemap your site references and parse it as XML. The root element must be either a list of URLs or an index that points to more sitemap files. We count entries to make sure no single file exceeds fifty thousand URLs, measure the uncompressed file size against the fifty megabyte cap, and confirm that every page address inside includes the full domain rather than a shortcut like /about.
Show technical detection method
Fetch each sitemap; validate XML; root is <urlset> or <sitemapindex>; URL count<=50000; uncompressed<=52428800; every <loc> absolute with scheme+host.
If your site fails: how to fix it
- Regenerate your sitemap using a current generator - your CMS's built-in sitemap, an SEO plugin, or a static-site framework's sitemap module. Avoid hand-edited XML, which is the most common source of malformed files.
- Confirm the file uses the sitemaps.org 0.9 namespace at the root element and that every page entry uses a full URL beginning with https:// and your domain name.
- If your site has more than fifty thousand URLs, split the sitemap into multiple files and reference them from a sitemap index file. Most modern generators do this automatically once you cross the threshold.
- If a single sitemap file is larger than fifty megabytes uncompressed, split it by section, by date, or by content type until each file fits under the cap.
- Open the sitemap URL in a browser to confirm it loads, then re-run the AI Ready Test.
Quick facts
| Maturity | ESTABLISHED |
|---|---|
| Weight | high |
| Category | Crawlability |
Primary sources
Related signals
Frequently asked questions
Can I serve a compressed gzip sitemap?
Yes - sitemap.xml.gz is supported by every major crawler. The fifty megabyte limit applies to the uncompressed file size, so a gzip sitemap can be a small download while still containing the same URL count limit of fifty thousand entries per file.
What if my generator produces relative URLs?
Most likely a configuration issue. Set the site URL or base URL in your generator's settings so it emits full URLs. Relative paths in sitemaps are technically out of spec and many crawlers will refuse to follow them, costing you discovery.
Do I really need a sitemap if my site has internal links to every page?
Sitemaps are still strongly recommended. They give crawlers an explicit list of URLs you care about and let them prioritise freshness using the lastmod field. Without a sitemap, deep or recently added pages can take weeks longer to appear in AI answers.
How often should the sitemap be regenerated?
Whenever significant content changes - new pages, removed pages, or major edits. Most CMS plugins regenerate it automatically. For static sites, regenerate at every build. A stale sitemap is not harmful, but it slows down how quickly AI systems learn about your new content.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.