Discovered: Mar 14, 2024 09:15 - UTC
Resolved: Mar 16, 2024 01:20 - UTC
After the incident on March 14th, approximately 2,000 tenants who had previously been migrated reprocessed steps in the initial migration.
Clients affected by the incident on the US5 cluster lost 134,727 IP addresses (Approximately 10% of devices across the affected tenants). The US1 cluster had five tenants who experienced similar issues.
All times in UTC
03/14/2024
21:00 - Cluster recovery from the March 14th incident leads to unexpected tenant migrations.
03/15/2024
13:23 - The relevant Auvik engineering team is informed of the issue with a specific client.
13:30 - The cause is misdiagnosed, and the tenant is restarted to address the issue.
14:00 - The restart does not resolve the issue, and a deeper investigation into the reason for the problem is begun.
16:45 - The engineering team discovers that the cause of the issue is an unexpected rerun of tenant migrations that were kicked off from the previous day’s incident.
16:55 - A plan is developed to reset IPs lost IPs against affected devices. This action will only reattach IPs to the proper device. Previous configuration customizations, backups, or alerting will be lost with the reconsolidation of the devices.
17:07 - The Auvik engineering team kicks off a systematic reattachment of deleted IPs.
03/16/2023
01:18 - The engineering team finished the reattachment of the removed IPs.
02:20 - The incident is declared closed.