Discovered: July 26, 2025 14:00 UTC
Resolved: July 31, 2025 12:45 UTC
Following a core system upgrade on July 26, 2025, multiple clusters experienced degraded performance due to unexpected issues during service initialization. This led to corruption in certain hierarchical data structures, which in turn impacted various user experiences.
A core infrastructure failure caused an overload in internal system processes, preventing certain backend services from initializing properly. As a result, critical hierarchical user-role data and related settings failed to load or loaded incorrectly across several environments.
The disruption impacted customers across multiple regions. Effects included:
All times are in UTC
07/26/2025
11:00 – Core upgrade initiated.
13:27 – Initial service failures observed. Engineering begins recovery processes.
14:19 – Backend query failures reported. Engineering continues to recover and stabilize services.
16:24 – Incident response team mobilized.
07/27/2025
(Services restored throughout the day)
Hierarchy services replayed and clusters rebooted to stabilize services.
Alerting system functionality restored.
7/28/2025
(Services restored throughout the day)
Shared monitoring agents reassociated.
Back-end service migrations initiated for affected tenants.
7/28/2025
(Services restored throughout the day)
Repair scripts run to reset affected data and restore processing pipelines.
7/30/2025-07/31/2025
(Services restored throughout the day)
Corrupted tenants identified and corrected.
All affected clusters verified for service integrity.
07/31/2025
12:45 - Incident considered resolved.