Discovered: Dec 13, 2024 17:03 - UTC
Resolved: Dec 13, 2024 18:23 - UTC
Routine maintenance tasks in preparation for the upcoming weekend's maintenance cause an unexpected load to the system.
Effect
The backend systems overwhelmed the systems on the US4 cluster, which caused a communication interruption with the tenants.
All times in UTC
12/13/2024
16:57 - Steps to prepare the system for the next day’s maintenance performed.
17:03 - Tenants on the US4 cluster become unreachable.
17:09 - The Auvik engineering team assembles stakeholders to investigate the service interruption.
17:25 - The backend systems on the US4 cluster begin to recover independently.
17:39 - Tenants begin to become reachable internally.
17:40 - Tenants become visible in the UI.
17:57 - Engineering addressed tenants that are not coming back up gracefully.
18:23 - Tenants on US4 have recovered.