Service Disruption - network map may be unavailable for some customers
Incident Report for Auvik Networks Inc.
Postmortem

Service Disruption - Maps Unavailable in the UI

Root Cause Analysis

Duration of incident

Discovered: Mar 09, 2024 11:00 - UTC
Resolved: Mar 10, 2024 03:21 - UTC

Cause

Significant maintenance upgrade to the system.

Effect

Maps were unavailable to customers across the platform.

Action taken

All times in UTC

03/09/2024

11:00 - Regularly planned maintenance performed on the system. This included infrastructure upgrades.

13:00 - Maintenance completed. A few internal issues were noticed, and action was taken to address them.

13:40 - Internal issues noticed at the end of maintenance appear to be addressed and resolved.

15:20 - Auvik Engineering is aware of issues with maps not loading in the UI.

15:30 - Additional permission issues were also discovered. An incident is declared, and the on-call team is assembled.

15:30-18:40 - The engineering team begins its investigation and works to discover the incident's underlying cause.

18:40 - The Core data actors and injector are restarted. Engineering must wait for results as the system reloads data.

18:40-21:40 - Engineering observes the results as they update. It is determined that the restart did not provide the desired outcome and that issues while recovering are occurring too slowly for a product environment.

Engineering decides to perform a complete system restart. The restart will involve staggering individual cluster restarts to prevent overloading the core part of the product.

21:40 - Engineering performs the complete system restart with staggered starts of each cluster.

03/10/2024

03::21 - All clusters have successfully restarted, and Map functionality is back at an acceptable product level. The incident is declared closed.

Future consideration(s)

  • Improve tenant inspection after maintenance windows to validate that there are no adverse effects from the changes implemented, especially after a more significant or complex upgrade.

    • Review and update the current post-maintenance checklist.
  • Create improved guidance for when a complete system restart and specific criteria to apply it are warranted.

  • Investigate why changes to the system from this upgrade caused a delay in map rendering that forced a staggered reboot.

Posted Mar 15, 2024 - 06:35 EDT

Resolved
The network maps issue has been resolved.
Posted Mar 09, 2024 - 21:21 EST
Update
Most clusters have recovered at this time. We are monitoring the remaining clusters before resolving this incident.
Posted Mar 09, 2024 - 20:48 EST
Monitoring
We’ve identified the source of the service disruption with network maps. We have restarted clusters and are seeing systems recover. We will continue to monitor the situation until all systems have recovered.
Posted Mar 09, 2024 - 18:51 EST
Update
We’ve identified the source of the service disruption with network maps. We will be restarting all of our clusters at this time. We will continue to update the status as the system begins to recover from the restart.
Posted Mar 09, 2024 - 17:08 EST
Update
We’ve identified the source of the service disruption with network maps. We are continuing to restore service as quickly as possible.
Posted Mar 09, 2024 - 16:03 EST
Identified
We’ve identified the source of the service disruption with network maps. We are working to restore service as quickly as possible.
Posted Mar 09, 2024 - 13:58 EST
Update
We are continuing to investigate the disruption to network map errors. Customers in CA1 will encounter additional errors for a short period of time as additional troubleshooting is being performed on that cluster. We will continue to provide updates as they become available.
Posted Mar 09, 2024 - 12:36 EST
Investigating
We’re experiencing disruption to the network map for some customers. Impacted customers may encounter errors loading the map. We will continue to provide updates as they become available.
Posted Mar 09, 2024 - 11:37 EST
This incident affected: Network Mgmt (my.auvik.com, us1.my.auvik.com, us2.my.auvik.com, us3.my.auvik.com, us4.my.auvik.com, eu1.my.auvik.com, eu2.my.auvik.com, au1.my.auvik.com, ca1.my.auvik.com, us5.my.auvik.com) and Auvik TrafficInsights.