...
Our approach is to use first the original labelling to see how our algorithm performs and then relabel it and measure the performance again. We are going to dedicate a section to explain how the relabelling was done. In the future we will repeat this exercise with a wider range of data from the SMD. Also other datasets will be used, both open source and created by us.
Methodology
Before presenting the results we have to introduce how we define true positives, false positives, and false negative. It might seem trivial to define these, but in reality an anomaly is very often not a single data point but a series of data points. In the case of IT operations the anomalies are often deviations from the usual behaviour that are persistent for some time. When labelling manually our data we might be off the actual starting and final point of said deviations, and sometimes might be impossible to precisely define a starting and finishing point, precisely. For example, is an anomaly starting when a deviation is well established or should we include in an anomaly also the oscillations that preceded the anomaly? Is an anomaly finishing when the value of a metric is back to a normal value or when it is on its way back to the normal value, but has not reached it yet?
...