All insights
ArchitectureOperations5 min read
Why background jobs are quietly your most critical system
When background jobs fail, customers don't see errors — but downstream, everything quietly breaks. They deserve the same scrutiny as your API.
Your API gets the dashboards. Your front-end gets the synthetic monitoring. Your background jobs — the thing that actually sends the email, generates the invoice, runs the import — gets a cron expression and a prayer.
When they break, you don't notice
A failed API call returns an error. A failed background job often returns nothing — it just doesn't happen. The customer notices three days later when the data didn't sync, or doesn't notice and complains about something downstream.
What background jobs need
- Explicit success/failure metrics, not just "did it run."
- Retries with exponential backoff for transient failures.
- Dead letter queues so failures don't disappear.
- Alerting on rate of failure, not absolute count.
The most important code in your system is often the code with the least monitoring.