Discovered: Jul 25, 2025 14:45 - UTC
Resolved: Jul 25, 2025 17:54 - UTC
Backend nodes were removed from the US3 cluster during a routine cleanup effort intended to optimize efficiencies.. This removal unintentionally included the backend hosting the root tenant, leading to disconnected collectors within our US3 cluster.
Collectors in the US3 cluster lost connectivity with customer sites, resulting in disruptions to data collection and monitoring services. This caused temporary gaps in visibility across affected environments.
All times are in UTC
07/25/2026
14:45 - Engineering notices that collector connections are beginning to fail.
18:28 – Tenants not loading observed by the team.
18:35 – Outage reports increase.
18:40 – SEV declared, and the root cause investigation begins.
18:48 – Backends re-added to balance load.
19:00 – Alternate issues ruled out.
19:23 – Root tenant backend identified as missing.
19:25 – Cluster restart initiated.
20:18 – Services begin recovery.
21:07 – Incident resolved.