Alerts - structure and data explained
This article will explain the data structure and data fields for the Eyer JSON alert, and go into more details on the most important data. All alerts can also be analyzed metric by metric in https://antteam.atlassian.net/wiki/pages/createpage.action?spaceKey=EKB&title=Grafana%20-%20setup%20Eyer%20integration&linkCreation=true&fromPageId=69369863
An alert can have three different statuses:
new
updated (an alert can have multiple updates)
closed
An update of an alert can happen due to changed severity, or that more systems / nodes / metrics are included in the alert (other metrics can be affected by the initial anomalous behavior, so the alert is updated).
For more information on the anomaly alert timeline and how alerts are updating, please see Anomaly alert timeline explained
The section of the alert immediately following either new [], updated [] or closed [], contains data about the overall alert. In the screenshot above we see that the alert:
is an update
has an overall “medium” severity. The alert severity is a reflection of the overall impact of all the metrics participating in the alert.
has a start date (when the alert was initially triggered).
has “null” as ended. This means that the alert is still ongoing.
has the “updated” timestamp populated. This means that there was an update of the initial alert at this timestamp.
has a unique Id. This Id will be the same throughout the alerts lifecycle until closed.
The “items” section contains details about which systems, nodes and metrics that are part of the alert (for Boomi, see Boomi Atoms - data collector metrics & structure .
a node is a grouping of metrics. In the example over, the metric is in the node “Operating System”.
a system. In the example above, the system is an Atom. The name is derived from where the Atom is hosted, plus its ContainerId. A system can have multiple nodes.
metrics. A node can host multiple metrics, but in the example above, only a single metric on that node has an ongoing anomaly (“Process CPU Load”).
severity. The severity shown per metric does not have to be the same severity as for the alert.
started timestamp. When an anomaly on that metric was first detected.
updated timestamp. When a change in severity last occured.
JSON | Description |
---|---|
{ |
|
"new": [], | Section for new alerts (none in example) |
"updated": [ | Section for updated alerts |
{ |
|
"severity": "medium", | Total alert severity |
"started": "2024-05-15T13:21:00Z", | Start date for the alert |
"ended": null, | End date for the alert (none in example, still ongoing) |
"updated": "2024-05-15T14:23:00Z", | Updated date for the alert |
"id": "6644b718eb5838c9d4ca9041", | Unique Id for the alert. Same throughout the alert lifecycle (new, updates, closed) |
"items": [ | Section containing details about which systems, nodes and metrics that participates in the alert. |
{ |
|
"node": { | Section that contains node information. Repeats per node. |
"id": 101, | Unique node Id |
"name": "Operating System. http://10.0.1.161:8778/jolokia", | Readable node name |
"system": { | Section that contains system information for the system the node is connected to |
"id": 1, | Unique system Id |
"name": "http://10.0.1.161:8778/jolokia" | Readable system name |
} |
|
}, |
|
"metrics": [ | Section that contains metrics connected to the node that participates in the alert. Repeats per metric |
{ |
|
"id": "a59df24a-e9ec-4c4c-a087-ea1375d4b9c7", | Unique node Id |
"name": "Process CPU Load", | Readable metric name |
"metric_type": "double", | Metric type (double, int) |
"aggregation": "avg", | Type of aggregation (avg, max, count) |
"severity": "severe", | Anomaly severity for the metric |
"started": "2024-05-15T13:20:00Z", | Start date for the anomaly on the metric |
"updated": "2024-05-15T14:23:00Z" | Updated date for the anomaly on the metric |
} |
|
] |
|
} |
|
] |
|
} |
|
], |
|
"closed": [] | Section for closed alerts (none in example) |
} |
|