OperationsArchitectureSeptember 18, 20244 min read

Why your queue depth is the only chart you need

If you can only have one operational dashboard, make it the queue depth across every async system. Everything else is downstream of that one number.

Modern systems are full of queues — task queues, message buses, retry queues, dead letter queues. Each one is a potential pile-up. The single most predictive chart for production health is the depth of those queues over time.

What queue depth tells you

If a queue is growing faster than it's draining, something downstream is slow or broken. If it's empty, you're either keeping up or not getting work. If it's stable but high, you've got a capacity problem. One chart tells you which.

Set alerts on rate, not size

A queue at 10,000 messages might be fine if it's stable. A queue at 100 messages might be a crisis if it just doubled. Alert on the derivative — the rate of growth — not the absolute number.

Watch what's getting stuck before you watch what's failing.

Why your queue depth is the only chart you need

What queue depth tells you

Set alerts on rate, not size

Keep reading

AEO is the new SEO: how to show up in answer engines

What to automate first (and what to leave alone)

Most operations are behind where they could be.