Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This part of the pipeline is dedicated to detect anomalies on single time series. It consist of a training part (baseline creation), that periodically (every 24 hours) determines what is the normal behaviour of the time series, and of a detection (live anomaly detection) part that close to real time determines if a behaviour different from the one observed in the past (an anomaly) is ongoing.

Baseline creation

Data selection

...

Among all the data available we select those that are useful to describe the current situation: we give highest weight to most recent data and to those containing recursive behaviour (daily, weekly, monthly).

...

The baselines that are going to be used in the anomaly detection are created. We use different models for the three different groups of time series classified in the pre-processing step.

High Frequency High Activity

...

For each aggregated data point, a different weight is assigned if the point is in the main baseline (in the case of HFHA can also outside but diverging at a reasonable trend and not too far, judged by using the autoregressive model), in a secondary baseline or outside all the baselines. An anomaly score is built incrementally by averaging among these weights in time for as long as the datapoints are mostly outside the main baseline. This results in a measurement (the score) that, even if not rigorous, gives an “at a glance” description of the anomaly and can be averaged across nodes and systems. The score is then summarised into levels of criticality, classifying the data points into red (high probability of anomaly), orange (medium-high probability of anomaly), yellow (medium-low probability of anomaly).  A yellow anomaly has a lower likelihood of being a real anomaly than a red, could also correspond to anomalies escalating or resolving.  If we focus on the anomalies that are classified as red, they include the most severe deviations from behaviour seen in the past, even if they will not include all the anomalies. Selecting different levels of this score gives different values for the recall and the precision.

...