Correct HTTP 429 Status with Retry-After Header When Throttling AI Agent Traffic
Returns the standard rate-limit response so agents back off correctly instead of giving up or looping.
What this signal tests
We check whether your API returns the correct HTTP status - 429 Too Many Requests - when an agent exceeds your rate limit, accompanied by a Retry-After header telling it how long to wait. Many sites incorrectly return 403 Forbidden or 503 Service Unavailable in this situation, which agents interpret very differently.
Why it matters for your visibility in AI
Agent frameworks know how to handle a 429: pause, read Retry-After, and try again. They do not know how to handle a 403 or 503 in the same situation, because those codes mean genuinely different things - "you are not allowed" or "the server is down". Mis-classifying a rate-limit as one of those tells the agent to give up entirely and report failure to the user. The consequence is silent: your support team sees no complaint, because the user's assistant simply told them "the service appears to be unavailable" and they moved on. Returning 429 with Retry-After turns those abandoned interactions into successful retries, often within seconds. It is the difference between a recovered conversation and a lost customer.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| 429 returned with Retry-After when overloaded. |
How we test it
Our scanner sends a deliberate burst of requests - slightly more than your declared limit - to an API endpoint and inspects each response. It passes if at least one response is HTTP 429 with a Retry-After header containing either a delta-seconds value or an HTTP-date. For low-throughput scans where we do not want to trip a real production limit, we accept the presence of RateLimit-Remaining headers as evidence that 429 would be returned correctly.
Show technical detection method
Burst N+1 requests; pass if at least one response is 429 with Retry-After. Accept RateLimit headers as proxy for low-throughput scanners.
If your site fails: how to fix it
- Audit how your API gateway, reverse proxy, and application middleware respond when a caller exceeds the rate limit; many setups default to 403 or 503 and were never reconfigured.
- Reconfigure each layer to return HTTP 429 Too Many Requests specifically for rate-limit cases, distinct from authentication failures (which stay as 401 or 403) and outages (which stay as 503).
- Include a Retry-After header on every 429 response, set to the number of seconds the caller should wait - usually the same as your window reset time.
- Verify the change end to end by deliberately tripping the limit in a staging environment and confirming the response status and header survive every intermediate proxy and CDN.
- Pair this with the RateLimit headers signal so agents can pace themselves before hitting the limit at all; 429 is the safety net, not the primary signal.
Quick facts
| Maturity | ESTABLISHED |
|---|---|
| Weight | medium |
| Category | Agent Actions |
Primary sources
Related signals
Frequently asked questions
What status code do we return today?
Many sites return 403 (Forbidden), 503 (Service Unavailable), or even 500 (Internal Server Error) for rate-limited requests. None of these are correct. RFC 6585 defines 429 specifically for this case, and agent frameworks expect it. The fix is usually a configuration change, not a code change.
Does this cost anything to implement?
It is a configuration change in your gateway or middleware, typically under an hour of engineering time. There are no licensing fees. Most modern frameworks already support 429; you usually just need to tell them to use it for rate-limit cases.
How is this different from the RateLimit-headers signal?
RateLimit-Remaining tells well-behaved agents how to pace themselves before hitting the limit. 429 plus Retry-After tells them how to recover when they do hit it. Both work together: one prevents the problem, the other handles it gracefully when it occurs anyway.
Will agents really retry automatically?
Yes. Every mainstream agent framework - OpenAI Agents SDK, Anthropic tool-use clients, LangChain, AutoGen - implements 429 retry with exponential backoff using Retry-After by default. You do not need to do anything beyond returning the correct response; the agents handle the rest.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.