Does your site return honest HTTP status codes when something is missing or broken?
Returning a real 404 for missing pages - not a 200 with an error message - keeps your AI footprint clean and trustworthy.
What this signal tests
We check that your site returns the correct HTTP status code for each situation: 404 or 410 for pages that no longer exist, 429 when rate-limiting a client, 5xx codes for genuine outages, and 200 only for pages that actually have real content. We particularly look for the common bug of serving a 'page not found' message inside a 200 OK response - known as a soft 404.
Why it matters for your visibility in AI
Soft 404s - where your server says 'success' but the body is actually an error template - are catastrophic for AI visibility. AI training pipelines and embedding indexes ingest the error template as if it were real content. Hundreds of nonsense pages with identical 'page not found' text get associated with your domain, which dilutes your topical authority and can trigger spam-quality demotions. Mis-classified status codes also break AI agents that visit your site. An agent that gets a 200 response assumes it has the page and uses the content; an agent that gets a 429 backs off and retries. Misreport the situation and the agent makes the wrong decision - either citing junk or hammering your server with retries. Honest status codes are part of being a good citizen of the modern web.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| Synthetic 404 probe returns 404/410; zero detected soft-404s. |
How we test it
We send a request to a deliberately-invented URL on your domain - something that cannot possibly exist - and check that the server responds with a 404 or 410 status. We then sample several real pages from your sitemap and look for telltale signs of soft 404s on responses claiming 200: very short bodies, missing main content, or text strings like 'page not found' or 'this page does not exist' inside otherwise successful responses.
Show technical detection method
Request /__nonexistent-{uuid}__; assert 404 or 410. Sample 200 responses; flag candidates where body<1024 bytes or contains error strings while status=200.
If your site fails: how to fix it
- Visit a deliberately bad URL on your site - for example yourdomain.com/this-page-cannot-exist-12345 - and check the HTTP status code your server returns. Browser developer tools or curl can show this.
- If the status is 200 rather than 404, find the framework or CMS setting that controls error responses. In most modern frameworks this is a small configuration change in the routing or middleware layer.
- Make sure your custom 404 design page is still served with a 404 status. A beautiful branded error page is great - but the status code underneath must still say 404, not 200.
- For URLs that used to exist but have been permanently removed (a discontinued product, a deleted post), return 410 Gone rather than 404. This tells AI crawlers to drop the URL from their index faster.
- Configure your CDN or load balancer to return appropriate 5xx codes during outages rather than serving a cached 200 error page.
- Re-run the AI Ready Test to confirm the synthetic 404 probe now returns a clean 404 or 410.
Quick facts
| Maturity | ESTABLISHED |
|---|---|
| Weight | high |
| Category | Crawlability |
Primary sources
Related signals
Frequently asked questions
What is a soft 404?
It is a response where the HTTP status code says 200 OK (success) but the page body is actually an error message like 'page not found'. The two signals contradict each other. Crawlers prefer the status code, so they ingest the error message as if it were real content for that URL.
When should I use 410 instead of 404?
Use 410 Gone for content you have deliberately and permanently removed - discontinued products, retired blog posts, takedown requests. Use 404 Not Found for URLs that may or may not exist (typos, future content, paths you do not yet handle). 410 causes faster de-indexing in AI and search systems.
Do crawlers really get confused by soft 404s?
Yes - Google has spoken publicly about this for over a decade, and AI crawlers inherit the same behaviour. Soft 404s waste your crawl budget, dilute your topical signal, and in extreme cases can trigger quality penalties because the same junk text appears under hundreds of URLs.
What about temporarily missing pages?
For genuinely temporary outages, return a 503 Service Unavailable with a Retry-After header. This tells crawlers the situation is temporary and they should come back later. Returning 200 with a 'we are down' message has the same poisoning effect as a soft 404.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.