Make sure your page converts cleanly into Markdown that AI tools can read

Headings, lists, links, tables, and code blocks should all survive an HTML-to-Markdown conversion.

Scan your site

What this signal tests

We run a standard HTML-to-Markdown conversion pipeline - Mozilla Readability followed by Turndown, or Trafilatura with Markdown output - across your page, then measure how much of the original structure survived. We look at preserved headings, lists, links, tables, code blocks, and the proportion of text retained. The composite extractability score must be at least 0.80.

Why it matters for your visibility in AI

Every major AI ingestion tool - ChatGPT browsing, Claude's web tools, Perplexity, Bing Copilot - converts the HTML it fetches into Markdown internally before passing the text to the language model. The conversion is the gateway: structure that survives is available to the model, and structure that gets stripped is gone for good. A page that converts cleanly is, in effect, pre-optimized for every assistant on the web. The concrete consequence is that custom UI patterns - JavaScript-rendered accordions, carousels, tab widgets, modal-only content - vanish in conversion. The model receives a partial, often misshapen version of your page and answers questions against that diminished view. Plain, semantic HTML survives the conversion intact; clever UI does not.

Pass criteria at a glance

Criterion Passes when
Composite extractability score >=0.80.

How we test it

We fetch the rendered DOM, run a deterministic readability extraction followed by HTML-to-Markdown conversion, and compare the resulting Markdown against the original DOM along several axes: percentage of headings preserved, percentage of list items preserved, percentage of links preserved, percentage of tables preserved with structure, percentage of code blocks preserved, and ratio of preserved-text length to original main-content text length. The composite score is the mean of these preservation ratios, and must be at least 0.80.

Show technical detection method
Run deterministic readability + turndown pipeline; measure preserved headings/lists/links vs DOM, plus preserved-text-length ratio; composite score = mean of preservation ratios.

If your site fails: how to fix it

  1. Replace custom JavaScript carousels and accordions with semantic disclosure elements (<details><summary>); these survive conversion and remain interactive in the browser.
  2. Render code samples in <pre><code> blocks rather than divs styled to look like code; the conversion pipeline only recognizes the semantic pair.
  3. Use real <table> markup for tabular data rather than CSS grid styled to resemble a table; grids look right visually but flatten to plain text in Markdown.
  4. Ensure server-side rendering of primary content; pages that only assemble content via client-side JavaScript may produce empty Markdown when crawlers fetch them without executing scripts.
  5. Run a quick check: paste your URL into a Markdown converter tool and inspect the output. If headings are missing or lists become paragraphs, your structure needs simplifying.

Quick facts

MaturityEMERGING
Weighthigh
CategoryContent Clarity

Primary sources

Related signals

Frequently asked questions

Does this depend on whether my site uses React or other JavaScript frameworks?

It depends on whether content is server-rendered or client-rendered. React with SSR (Next.js, Remix) produces clean HTML that converts well. Client-only single-page apps may serve a near-empty initial HTML and fail conversion entirely. The framework is fine; the rendering strategy is what matters.

What about interactive demos and widgets?

Interactivity that requires JavaScript will not survive conversion to Markdown - that is a fundamental limitation of plain text. Provide a static fallback summary or description of what the widget does, so the converted Markdown carries the gist even when the live behavior is lost.

Is this the same as Reader Mode in browsers?

Closely related. Browser Reader Mode uses Mozilla Readability, the same algorithm in our pipeline. If your page renders cleanly in Reader Mode, it likely converts cleanly to Markdown. Test your most important pages in Firefox or Safari Reader Mode as a quick sanity check.

Are headings only important for SEO or also for AI?

Headings drive both classic SEO and AI extractability. In conversion, headings become the Markdown outline that defines chunk boundaries for retrieval. Skipped or missing headings mean chunks lose their titles, so retrieval is less targeted. The same structural choice helps both audiences for different reasons.

Run your own scan

Run a free scan and see how your site grades across all 155 AI-readiness signals.

Scan your site