Correlation and grouping of time series
To know how different resources work together we calculate correlation and we group the resources based on correlations. This method is meant to be complementary and not substitutive to informed ways of grouping, like user defined or transaction tracing.
Correlation
Correlations among different nodes are calculated. This is to give a measurement on how different resources work together. This is done by using the time series data that are already available, without other inputs (like transaction tracing or user defined topology)
Pearson correlations among time series are calculated and the max of the absolute values of the intercorrelations among the time series of two nodes is assumed to be the correlation between two nodes (a bit of an overestimate). Currently, only simultaneous correlations within 1 and 15 minutes are considered.
Qualification for correlation
In order to be considered by the correlation algorithm a node have to contain at leas one metric which:
produces at least 5 data points per week
has at least 4 changes of value per week
had data in the last three days
changed value in the last three days
Flat-liners, or metric which are mostly constant, are not considered for correlation.
If a node has no metrics suitable for correlation, but had those in the past, will have correlations of fading strength.
Correlations among nodes will be retuned by the correlation API only if two nodes have at least a pair of metrics with a correlation of 0.5 in absolute value.
Grouping using correlation
Groups of nodes are formed by using correlation as a measure. This is to give a completely automated information on topology. The correlation is used to define a measure of distance, which is later used for an hierarchical clustering. This assigns each node to only one group. Each group contain intercorrelated nodes that have a correlation of at least 0.5. Since we are using an hierarchical clustering, nodes that have a correlation above 0.5 might not end up in the same group. For example if node A and B are correlated, but node A has a stronger correlation with node C, and B and C are not correlated, then A will be in the same correlation group of C, but B will not be included in this group.