This part of the pipeline is dedicated to detect anomalies on single time series. It consist of a training part (baseline creation), that periodically (every 24 hours) determines what is the normal behaviour of the time series, and of a detection (live anomaly detection) part that close to real time determines if a behaviour different from the one observed in the past (an anomaly) is ongoing.

Baseline creation

Data selection

An example of our algorithm learning a cyclical behaviour

Among all the data available we select those that are useful to describe the current situation: we give highest weight to most recent data and to those containing recursive behaviour (daily, weekly, monthly).

There are often cycles and seasonal behavior in the data, and therefore there are historical data that have a higher weight when baselining. For example in many cases the behaviour 24 hours ago is more representative of the situation now than the behaviour 13 hours ago (daily cycle). Or the behaviour one week ago is more significant than the one 5 weeks ago. The current version is using a static selection, the same for all the time series, with the same weight. This is very light weight as we don’t have to learn the weights for each metric, and makes this model easily scalable. Common sense is used for this selection: recent data matter more and we use daily, weekly and monthly cycles. This makes the model an equivalent to a hybrid between clustering and a static ARIMA.

Clustering for baseline creation

This is the part that groups the data observed in the past in different observed behaviours. the historical data are grouped in up to three different clusters. One describing the most frequent behaviour and up to two secondary behaviors, which includes less frequent behaviours or past anomalies. Therefore the approach we use is a multi-baseline behaviour, in which one baseline corresponds to the main behaviour, and the rest describe secondary behaviours which cannot be guaranteed to be anomaly free. To do this we use univariate unsupervised clustering methods.

Creation of the baselines

The baselines that are going to be used in the anomaly detection are created. We use different models for the three different groups of time series classified in the pre-processing step.

High Frequency High Activity

This are the metrics that are richer of information, so we can make a more complete analysis. Before forming the corridor the data are re-aggregated considering the frequency, to treat missing data. In the case in which the data are aggregated as total the analysis is considering the data smoothened by re-aggregating by 5 minutes, this is to reabsorb oscillations and distinguish the case in which one metric is consistently zero for long time from the one in which there are occasional oscillation to zero. One baseline to describe the most frequent behaviour is always formed, secondary baselines are created if there are enough data that do not fit the main behaviour. This is done by fitting per hour the historical data and considering 3 standard deviations. Also an autoregressive model is learned to predict the next data point based on the last data point which came in. This autoregressive model is used in the anomaly detection phase to confirm that the trend of data points diverging from the main behaviour is to be considered anomalous. For these metrics we use a confirmation window of 15 minutes and an anomaly is detected only if there are more than 8 minutes in which the value of the metric deviated from the main corridor. This metrics are prone to frequent oscillation, so using a confirmation window reduces alert fatigue. This means that anomalies that are shorter than 8 minutes are not detectable.

An example of a time series (the line in blue) compared to the baselines. The main baseline is in light blue, while the secondary baselines are in shades of grey.

High Frequency Low Activity Low Frequency

In the case in which the data are aggregated as total the analysis is considering the data smoothened by re-aggregating by 5 minutes, this is to reabsorb oscillations and distinguish the case in which one metric is consistently zero for long time from the one in which there are occasional oscillation to zero. A single baseline describing the most frequent behaviour is formed. No autoregressive model is considered. This is because this have way less structure than the HFHA one.

Low Frequency

The data are aggregated hourly, this is because this tipe of metrics have only hourly anomaly detection enabled. A single baseline describing the most frequent behaviour is formed. No autoregressive model is considered.

For all the three considered cases (HFHA, HFLA, LF) the details of the algorithms vary a bit according to the aggregation of the metric (average, total or maximum).

Live anomaly detection

The anomaly detection works considering re-aggregating the data that come in in a certain span of time. The HFHA and for the HFLA anomaly detection is updating every minute, the LF every hour. For each aggregated data point, we evaluate if there is an anomaly ongoing. Single point anomalies are then built to a unique anomaly for anomalous points close in time: we deliver an anomaly detection that gives information on how a situation develops in time.

Criticality of the anomaly

For each aggregated data point, a different weight is assigned if the point is in the main baseline (in the case of HFHA can also outside but diverging at a reasonable trend and not too far, judged by using the autoregressive model), in a secondary baseline or outside all the baselines. An anomaly score is built incrementally by averaging among these weights in time for as long as the datapoints are mostly outside the main baseline. This results in a measurement (the score) that, even if not rigorous, gives an “at a glance” description of the anomaly and can be averaged across nodes and systems. The score is then summarised into levels of criticality, classifying the data points into severe (high probability of anomaly), medium (medium probability of anomaly), low (low probability of anomaly). A yellow anomaly has a lower likelihood of being a real anomaly than a red, could also correspond to anomalies escalating or resolving. If we focus on the anomalies that are classified as red, they include the most severe deviations from behaviour seen in the past, even if they will not include all the anomalies.

An example of our algorithm classifying anomalies (the main baseline is green, being the normal behaviour)

Eyer Knowledge base

Univariate anomaly detection