Back to Blog
Post Mortems

Post Mortem: SQL Servers Overloaded

By Matthew Tse

On October 5, 2025, our SQL servers became overloaded, impacting email deliveries and API requests.

Summary

SQL servers experienced CPU saturation reaching 100%, causing cascading failures across the email delivery pipeline and API.

Timeline

  • 11:20 UTC - CPU alerts triggered.
  • 11:25 UTC - Engineering team engaged.
  • 11:40 UTC - Identified runaway query from a background job.
  • 11:45 UTC - Terminated the offending process.
  • 12:00 UTC - Systems fully recovered.

Root Cause

A background analytics job that runs daily was triggered with incorrect parameters, causing it to process the entire dataset instead of the incremental batch. This consumed all available database connections and CPU.

Impact

For approximately 40 minutes, email deliveries were delayed (queued in SQS) and API requests returned timeouts. No emails were lost — all queued messages were delivered once the database recovered.

Resolution

Killed the runaway process, added parameter validation to the background job, and implemented connection pool limits for background processes separate from production traffic.

Prevention

Background jobs now run on a read replica with strict resource limits. Production database connections are reserved exclusively for the API and delivery pipeline.