OperationsArchitectureOctober 16, 20235 min read

Latency budgets across services

Latency budgets are easy to set per service. They get interesting — and most teams get them wrong — when you compose services together.

Each service has a latency budget — say, 200ms p95. The team meets it. Multiply through a chain of five services, each at 200ms, and the end-user experience is now a full second. Each team is in compliance and the overall result is unacceptable.

The trap

Per-service budgets without an end-to-end budget add up to whatever they add up to. The user doesn't care about each service's number; they care about the page loading. Without an explicit budget at the composition level, the individual budgets sum to something nobody owns.

How to fix it

Set an end-to-end budget owned by a single team.
Derive per-service budgets from that, not the other way around.
Track p99, not just p95 — long-tail latency compounds across calls.
When the budget breaks, the owner negotiates — not each team unilaterally.

Per-service latency budgets don't compose. End-to-end budgets are the only ones the user cares about.

Latency budgets across services

The trap

How to fix it

Keep reading

AEO is the new SEO: how to show up in answer engines

What to automate first (and what to leave alone)

Most operations are behind where they could be.