RateLimit Response Headers So AI Agents Can Pace Themselves and Not Get Banned
Tells well-behaved AI agents how much quota they have left so they slow down before they are blocked.
What this signal tests
We check whether your API responses include rate-limit headers that tell the caller how much quota remains and when it resets. The signal accepts either the IETF draft RateLimit and RateLimit-Policy headers, or the older but widely used X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset trio. At least one set must be present.
Why it matters for your visibility in AI
AI agents are bursty. A single agent might fire dozens of requests in a few seconds while it tries to answer a user, then idle for hours. Without rate-limit headers, the agent has no way to know whether you have throttled it or whether you have plenty of capacity left. The result is one of two failure modes: either the agent panics and under-uses your API to avoid being blocked, or it overshoots and trips your defences, getting itself banned mid-conversation in front of your customer. Rate-limit headers fix this by giving the agent a numerical budget. Well-behaved agent frameworks read RateLimit-Remaining and slow themselves down automatically. This means more successful interactions, fewer 429 errors, and fewer support tickets from frustrated users whose assistant got cut off mid-flow.
Pass criteria at a glance
| Criterion | Passes when |
|---|---|
| At least one set of RateLimit headers present. |
How we test it
Our scanner makes a GET request to one of your API endpoints and inspects the response headers for either the modern IETF draft headers (RateLimit and optionally RateLimit-Policy) or the de-facto X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset trio. Either set is acceptable. The values themselves are not validated for accuracy in this scan; only their presence on the response.
Show technical detection method
GET API endpoint; check for RateLimit/RateLimit-Policy (draft-ietf-httpapi-ratelimit-headers) or X-RateLimit-* trio.
If your site fails: how to fix it
- Identify the rate limits you already enforce in your API gateway, reverse proxy, or application middleware; this signal is about exposing what already exists, not building new throttling.
- Configure your gateway or middleware to emit the IETF draft RateLimit and RateLimit-Policy headers on every API response; the work to do this is best done by a developer who can read the draft specification.
- If your stack does not yet support the IETF draft, fall back to the older X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset trio - agents still understand both formats.
- Confirm that the Remaining value updates correctly across a burst of requests; static or always-equal values defeat the purpose because agents will not pace themselves.
- Document your rate-limit policy alongside your OpenAPI document so agents can plan ahead instead of discovering limits only after they hit them.
Quick facts
| Maturity | EMERGING |
|---|---|
| Weight | medium |
| Category | Agent Actions |
Primary sources
Related signals
Frequently asked questions
Does this cost anything to implement?
Most modern API gateways - Kong, Cloudflare, AWS API Gateway, Apigee, Tyk - already track rate limits internally; emitting the headers is a configuration toggle. There are no licensing fees. For custom middleware, the cost is a developer day or two.
Will this matter in 2026 or is it years away?
It matters now. The major agent frameworks - LangChain, OpenAI's Agents SDK, Anthropic's Claude tool-use libraries - already read these headers if present. Every agent integration you have today benefits, and the benefit grows as agent traffic grows.
Should we use the new IETF headers or the older X-RateLimit ones?
Both are acceptable for this signal. The IETF draft is the future-proof choice and is being adopted across the industry. If you are emitting headers for the first time, prefer the IETF draft. If you already emit X-RateLimit-*, leaving them in place is fine.
What if our API is not metered?
If you genuinely have no rate limits, this signal is less important - agents will assume unlimited access until they hit other defences. However, almost every public API has implicit limits (CPU, database connections); emitting headers helps agents discover them gracefully rather than by failure.
Run your own scan
Run a free scan and see how your site grades across all 155 AI-readiness signals.