Latency budgets across services
Latency budgets are easy to set per service. They get interesting — and most teams get them wrong — when you compose services together.
Each service has a latency budget — say, 200ms p95. The team meets it. Multiply through a chain of five services, each at 200ms, and the end-user experience is now a full second. Each team is in compliance and the overall result is unacceptable.
The trap
Per-service budgets without an end-to-end budget add up to whatever they add up to. The user doesn't care about each service's number; they care about the page loading. Without an explicit budget at the composition level, the individual budgets sum to something nobody owns.
How to fix it
- Set an end-to-end budget owned by a single team.
- Derive per-service budgets from that, not the other way around.
- Track p99, not just p95 — long-tail latency compounds across calls.
- When the budget breaks, the owner negotiates — not each team unilaterally.
Per-service latency budgets don't compose. End-to-end budgets are the only ones the user cares about.