Alerting

Different anomalies in a single time series are grouped into an alert, which can include several related nodes. This provides more context for each anomaly and reduces the number of alerts sent to the user. Time series within the same node are already seen as related, so they are always alerted together. To capture relationships between different nodes, we use groups based on correlations among nodes. Time series from nodes in the same group are also alerted together.

Alerting is enabled once your environment's metrics are onboarded to the ML pipeline, which takes at least 7 days to gather enough data for baselines and correlations ( ). As more data is collected, these baselines and correlations improve, reducing the noise in alerts over the first few weeks.

Each alert includes a field for the alert's severity and a field for the severity of each deviation ( ). You can use both severities to set up notifications and automated actions.

Severity of the deviations on single metrics

The criticality of deviations in single metrics indicates how likely they are to be anomalies, based on multiple baselines:

  • Low: Low probability of being an anomaly. The metric mostly stays within the main or secondary corridor.

  • Medium: Medium probability of being an anomaly. The metric occasionally goes outside all baselines, but not consistently.

  • Severe: High probability of being an anomaly. The metric mostly stays outside all baselines.

Alerts are created with deviations of any severity and are updated whenever a metric changes severity. You can set customized actions for alerts that contain deviations of a certain severity or when a specific metric reaches a certain severity.

Severity of the alert

The severity of an alert is determined by both the severity of the included metrics and how deviations spread across correlated metrics and nodes:

  • Low: The alert does not contain any severe deviations.

  • Medium: The alert contains at least one severe deviation, but only one node is impacted, and less than 75% of that node's metrics are severely deviated.

  • Severe: The alert contains multiple severe deviations. If only one node is involved, more than 75% of its metrics are in a severe state. Nodes with only one metric will trigger a severe alert if that metric has a severe deviation. If more than one node has severe deviations, the alert is always severe.

This severity can be used to define customized actions.

Â