Legacy alerts for sites on the US1 cluster delayed

Incident Report for Auvik Networks Inc.

Postmortem

Service Disruption - Legacy alerts for sites on the US1 cluster are delayed

Root Cause Analysis

Duration of incident

Discovered: Jun 11, 2025 00:00 - UTC
Resolved: Jun 11, 2025 12:00 - UTC

Cause

The job manager that processes legacy alerts restarted due to memory issues and did not recover properly.

Effect

This resulted in a backlog of posting legacy alerts for customers on the US1 cluster.

Action taken

All times are in UTC
06/11/2025

00:00 The processing job for legacy alerts fails on the US1 cluster.

10:00 Auvik Support receives a support ticket regarding missing alerts since the previous evening. Engineering begins the investigation.

10:00-11:00 Engineering determines the cause for the alerts not processing. The legacy alerting service is reinitiated, and additional memory is allocated to the work. Alerts begin to flow.

12:00 The delayed legacy alerts are fully processed, and the system is reported up to date.

Future consideration(s)

  • Auvik will enhance its internal monitoring to detect and respond to this failure more effectively.
Posted Jun 11, 2025 - 15:52 EDT

Resolved

The processing of legacy alerts for clients on the US1 cluster was delayed on May 11, 2025, from 00:00 to 12:00 UTC. The service has been restored, and alerts have been processed through the system with the proper time codes. No other services or clusters were affected.

Auvik apologizes for the delay in alerts and will post an RCA after conducting an internal analysis.
Posted Jun 11, 2025 - 08:00 EDT