Discovered: Aug 19, 2025 – 13:10 UTC
Resolved: Aug 19, 2025 – 15:30 UTC
Recent configuration changes to backend data replication caused a surge in database writes. This increased CPU utilization across all clusters, but while other clusters recovered, the US2 database instance did not. The elevated CPU load persisted for over 24 hours, which led to customer-facing slowness when loading sites on the US2 cluster.
Customers on the US2 cluster experienced significantly slower site load times in the Auvik UI. This impacted demos, trials, and production users, resulting in degraded user experience until resolution.
All times are in UTC
08/19/2025
13:10 – Sales reported demo site loading issues on US2.
13:22 – Engineering identified elevated CPU usage on the US2 database.
13:27 – Investigation into DB performance began.
13:57 – Confirmed that US2 had remained at 100% CPU since Aug 18.
14:25 – Troubleshooting efforts to recover performance begin.
14:36 – Proposal made to scale up resources for the DB.
15:00 – Decision made to upgrade the US2 database instance type.
15:09 – Database instance size increased.
15:22 – Read/write latency returned to normal.
15:30 – US2 UI performance confirmed as fully recovered.