llms.txt - a curated map of your site written for AI crawlers

Tests whether you publish an /llms.txt file giving AI agents a context-window-sized overview of your site.

What this signal tests

We check for /llms.txt - a Markdown file at the root of your site that hands AI crawlers a curated, link-organised overview of what matters. Proposed by Jeremy Howard in late 2024 and modelled loosely on robots.txt and sitemap.xml, it follows a simple structure: H1 site name, optional blockquote summary, then H2 sections with link lists.

Why it matters for your visibility in AI

llms.txt is the most-adopted of the new AI-era discovery files. Anthropic, Cloudflare, Vercel, Hugging Face, Mintlify, and over 1500 other organisations publish one. AI assistants increasingly look for it during inference because, unlike a sitemap, it fits in a context window and tells the model which pages to prioritise reading. If you have docs, knowledge base content, or a product catalogue, an llms.txt is the cheapest way to influence how AI summarises your business. The practical consequence: when Claude or ChatGPT is asked about your company, the model is more likely to fetch the pages you nominate in your llms.txt than to rely on what it absorbed at training time. That means more current answers, fewer hallucinated product details, and a real shot at being cited as the authoritative source for queries about your domain.

Pass criteria at a glance

Criterion	Passes when
Valid llms.txt structure; linked URLs resolve.

How we test it

We send a GET request to /llms.txt and verify several things. The response must be 200 with content type text/markdown or text/plain. The first non-empty line must be an H1 (a `#` followed by your site name). The body should parse as valid Markdown. We also spot-check a sample of the linked URLs to confirm they resolve. A file that 404s, returns HTML, or fails the H1-first-line rule is a fail. The spec at https://llmstxt.org/ governs the structure.

Show technical detection method

GET /llms.txt; 200 + text/markdown|text/plain + H1 first non-empty line + valid Markdown.

If your site fails: how to fix it

Write a Markdown file following the structure at https://llmstxt.org/: H1 with your site name, optional blockquote one-line summary, then H2 sections ("Docs", "Guides", "Reference", "Optional") containing bulleted link lists.
Keep it short - the entire file should fit in roughly 8K tokens. The point is curation, not completeness; a sitemap.xml is for crawlers, an llms.txt is for models that have to choose what to read.
Serve it at exactly /llms.txt with Content-Type: text/markdown; charset=utf-8 and a 2xx status.
If you use Mintlify, Docusaurus, Nextra, or a similar docs platform, enable the built-in llms.txt generator - these are widely available as of 2025.
Update it on the same cadence as your sitemap. Stale links are worse than no llms.txt.

Quick facts

Maturity	EMERGING
Weight	high
Category	Emerging Standards

Primary sources

Related signals

Frequently asked questions

Is llms.txt a real standard?

It is a community-proposed convention, not yet an IETF RFC. There is no central registry approving it. That said, it has been adopted by enough major AI vendors and documentation platforms that it is effectively a de facto standard for AI-friendly site discovery in 2025.

Which AI systems read llms.txt today?

Anthropic's Claude (via web fetch tools), Perplexity, several Mintlify-hosted assistants, and most agent frameworks that include a web-fetch step. ChatGPT search has begun to honour it for some sites. Coverage is patchy but trending toward universal.

How is this different from a sitemap.xml?

A sitemap lists every URL for a crawler. An llms.txt lists the URLs you most want a language model to actually read at inference time, with human-readable section headings and short descriptions. Sitemap is for indexing; llms.txt is for prioritising.

Will fixing this still matter in two years?

Almost certainly yes. Even if the file format evolves, the underlying idea - site owners curating an AI-shaped overview of their content - is now baked into multiple major platforms. The cost of publishing one is minimal, so the asymmetry favours doing it.

Run your own scan

Run a free scan and see how your site grades across all 155 AI-readiness signals.

Scan your site