Collectors disconnected in us3

Incident Report for Auvik Networks Inc.

Postmortem

Service Disruption - Collectors offline for clients on the US3 cluster

Root Cause Analysis

Duration of incident

Discovered: Jul 25, 2025 14:45 - UTC
Resolved: Jul 25, 2025 17:54 - UTC

Cause

Backend nodes were removed from the US3 cluster during a routine cleanup effort intended to optimize efficiencies.. This removal unintentionally included the backend hosting the root tenant, leading to disconnected collectors within our US3 cluster.

Effect

Collectors in the US3 cluster lost connectivity with customer sites, resulting in disruptions to data collection and monitoring services. This caused temporary gaps in visibility across affected environments.

Action taken

All times are in UTC

07/25/2026

14:45 - Engineering notices that collector connections are beginning to fail.

18:28 – Tenants not loading observed by the team.

18:35 – Outage reports increase.

18:40 – SEV declared, and the root cause investigation begins.

18:48 – Backends re-added to balance load.

19:00 – Alternate issues ruled out.

19:23 – Root tenant backend identified as missing.

19:25 – Cluster restart initiated.

20:18 – Services begin recovery.

21:07 – Incident resolved.

Future consideration(s)

  • Strengthen the backend removal process to confirm the root tenant is excluded.
Posted Jul 30, 2025 - 12:30 EDT

Resolved

The incident has been resolved.
Posted Jul 25, 2025 - 17:13 EDT

Monitoring

We have restarted the affected systems, and collectors are beginning to reconnect. We are monitoring the recovery closely.
Posted Jul 25, 2025 - 16:27 EDT

Investigating

Collectors have been disconnected in us3 since 14:35 ET (18:35 UTC). We are investigating the issue.
Posted Jul 25, 2025 - 15:24 EDT
This incident affected: Network Mgmt (us3.my.auvik.com).