Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Different anomalies on in a single time series are grouped in into an alert containing potentially several nodes, to give a bigger context to , which can include several related nodes. This provides more context for each anomaly and also to reduce reduces the number of alerts sent to the user. Time series that are in within the same node are already considered seen as related, so they will are always be alerted together. To capture inter-nodes relations we are using the groups created by using relationships between different nodes, we use groups based on correlations among nodes. Time series which belongs to from nodes in the same group are also alerted together.

Alerting is enabled once the first metrics of your environment's metrics are onboarded to the ML pipeline, which takes at least 7 days to gather enough data for baselines and correlations (Onboarding, preprocessing and filtering of the data ). The onboarding of metrics happens after a minimum of 7 days, this is to allow enough data to learn baselines and correlations. As more data are is collected, these baselines and correlations are improved and the alerting will get less noisy as improve, reducing the noise in alerts over the first few weeks have passed.

When receiving an alert (Alerts - structure and data explained ) there is Each alert includes a field for the severity of the alert itself alert's severity and a field for the severity of each deviation included in the alert. Both the severity of the alert and the severity of the anomalies can be used to setup (Alerts - structure and data explained ). You can use both severities to set up notifications and automated actions.

Severity of the deviations on single metrics

The criticality of the deviations on in single metrics is an indication of indicates how likely it is that said deviations are anomalies. The criticality is defined using the multiple baselines.they are to be anomalies, based on multiple baselines:

  • Low: Low probability of being an anomaly. The metric

...

  • mostly stays within the main or

...

  • secondary corridor.

  • Medium: Medium probability of being an anomaly. The metric

...

  • occasionally goes outside all

...

  • baselines, but not

...

  • consistently.

  • Severe: High probability of being an anomaly. The metric

...

  • mostly stays outside all

...

  • baselines.

Alert Alerts are created with deviations of any severity and are updated taking every time whenever a metric changes severity. Customised actions You can be set when an alert contains at least one deviation set customized actions for alerts that contain deviations of a certain severity or when a certain specific metric hits reaches a certain severity.

Severity of the alert

The severity of the an alert is based not only on determined by both the severity of the metrics included in the alert but also on how the deviation propagate on included metrics and how deviations spread across correlated metrics and nodes.:

  • Low: The alert does not contain any severe

...

  • deviations.

  • Medium: The alert contains at least one severe deviation, but only one node is impacted, and less than 75%

...

  • of

...

  • that node

...

  • 's metrics are severely deviated.

  • Severe: The alert contains

...

  • multiple severe deviations. If only one node is

...

  • involved, more than 75% of

...

  • its metrics are in a severe state.

...

  • Nodes with only one metric will

...

  • trigger a severe alert if

...

  • that metric has a severe deviation. If

...

  • more than one node

...

  • has severe deviations

...

  • , the alert is always severe.

This severity can be used to define customised customized actions.