Skip to content
All insights
OperationsArchitecture5 min read

Latency budgets across services

Latency budgets are easy to set per service. They get interesting — and most teams get them wrong — when you compose services together.

Each service has a latency budget — say, 200ms p95. The team meets it. Multiply through a chain of five services, each at 200ms, and the end-user experience is now a full second. Each team is in compliance and the overall result is unacceptable.

The trap

Per-service budgets without an end-to-end budget add up to whatever they add up to. The user doesn't care about each service's number; they care about the page loading. Without an explicit budget at the composition level, the individual budgets sum to something nobody owns.

How to fix it

  • Set an end-to-end budget owned by a single team.
  • Derive per-service budgets from that, not the other way around.
  • Track p99, not just p95 — long-tail latency compounds across calls.
  • When the budget breaks, the owner negotiates — not each team unilaterally.
Per-service latency budgets don't compose. End-to-end budgets are the only ones the user cares about.

Most operations are behind where they could be.

Book a strategy call. We'll map one system worth automating in the next 30 days. No pitch, just the plan.