Different anomalies in a single time series are grouped into an alert, which can include several related nodes. This provides more context for each anomaly and reduces the number of alerts sent to the user. Time series within the same node are already seen as related, so they are always alerted together. To capture relationships between different nodes, we use groups based on correlations among nodes. Time series from nodes in the same group are also alerted together.
Alerting is enabled once your environment's metrics are onboarded to the ML pipeline, which takes at least 7 days to gather enough data for baselines and correlations (Onboarding, preprocessing and filtering of the data ). As more data is collected, these baselines and correlations improve, reducing the noise in alerts over the first few weeks.
Each alert includes a field for the alert's severity and a field for the severity of each deviation (Alerts - structure and data explained ). You can use both severities to set up notifications and automated actions.
Severity of the deviations on single metrics
The criticality of the deviations on single metrics is an indication of how likely it is that said deviations are anomalies. The criticality is defined using the multiple baselines.
Low: Low probability of being an anomaly. The metric has spent most of the time in the main or in the secondary corridor.
Medium: Medium probability of being an anomaly. The metric has spent some time outside all the baselines, but not the majority.
Severe: High probability of being an anomaly. The metric has spent most of time outside all the baselines.
Alert are created with deviations of any severity and updated taking every time a metric changes severity. Customised actions can be set when an alert contains at least one deviation of a certain severity or when a certain metric hits a certain severity.
Severity of the alert
The severity of the alert is based not only on the severity of the metrics included in the alert but also on how the deviation propagate on correlated metrics and nodes.
Low: The alert does not contain any severe deviation.
Medium: The alert contains at least one severe deviation, but only one node is impacted, and less than 75% percent of the metrics of that node have a severe deviation.
Severe: The alert contains several metrics that are in severe state. If only one node is involve this will have more than 75% of the metrics in a severe state. Note that nodes with only one metric will automatically trigger a severe alert if effected by a severe deviation. If an alert includes more than one node with severe deviations will be always in a severe state.
Also this severity state can be used to define customised actions.