Discovered: Dec 5, 2023, 15:47 - UTC
Resolved: Dec 5, 2023, 19:00 - UTC
Rolling out the new Juniper Mist capabilities for GA release to clients.
The amount of accumulated data in the product for the Juniper Mist feature overloaded the capabilities of the US4 cluster to process the data. The amount of historical data pushed to production was too large with a few clients on this cluster, which caused a few backend nodes to fail. This caused connectivity issues with the clients associated with the failed backend nodes.
Action taken
All times in UTC
12/05/2023
15:47 - Auvik Engineering begins to roll out the GA release for the Juniper Mist monitoring.
15:55 - Backend services related to this upgrade show signs of stress.
16:15 - Errors occur on the US4 cluster, with some tenants having issues connecting.
16:30 - All other clusters complete the Juniper Mist release action except US4.
17:05 - A decision is made to roll back changes for the Juniper Mist release on the US4 cluster. Engineering performs the rollback and waits for the changes to propagate in the US4 Cluster.
17:26 - A few of the backend nodes continued to throw errors. Engineering restarts these backend nodes to clear the errors.
18:00 - The US4 cluster was running normally. The incident is closed.