Map your custom data source - Example 3 with LLM / ChatGPT assistance

Map your custom data source - Example 3 with LLM / ChatGPT assistance

You can get great assistance from a LLM (ChatGPT / Gemini / Mistral etc) on the mapping. It is just about asking the correct question and providing enough information for the LLM.

Before you ask the LLM about a mapping suggestion, make sure you pick the correct mapping example and output for it to “understand” how it should do the mapping.

In the example below, I provide four pieces of information to the LLM:

  1. An example input file (JSON). In this case, the input file from mapping example 1

  2. An example mapping (JSON). Again, from the mapping example 1

  3. How the mapped example output file will look (from example 1)

  4. Then ask how it would map a new input file based on the above.

 

In the example below, we provide a single line of data coming from the Influx Telegraf (SNMP in/out Octets), which we know is going to repeat and potentially come from multiple routers and switches.

 

LLM prompt

Learn how to do a json mapping. You have the file below

{ "metrics": [ { "fields": { "disk_reads_mean": 1.0121461361787716, "disk_writes_per_sec_mean": 0.8571885607181682, "last_Run_mean": 1740047577489, "log_bytes_flushed_mean": 3.9596305549082618, "logical_reads_mean": 914.9966524003087, "total_sessions_mean": 14, "waits_total": 0, "waits_total_mean": 7.3 }, "name": "dbwatch", "tags": { "host": "testmachine1", "instance": "Database1", "url": "http://localhost:8080/dashboard/dbWatch%2001/data/HkRmUaLz0Z.telegraf.json" }, "timestamp": 1740047760 } ] }

then you have the JSON mapping

{ "selector": [ "tags.host", "tags.instance", "timestamp" ], "type": "telegraf.waitstats", "id_selector": "tags.instance", "root": "metrics", "system": "tags.host", "mappings": [ { "type": "telegraf.waitstats.waits", "guid": "16e41dba-e6a4-4427-b753-cf6eee41e384", "name": "dbWatch waits", "timestamp": "timestamp", "fields": [ { "name": "fields.disk_reads_mean", "display": "Disk reads", "type": "float", "aggregation": "avg", "guid": "9ed41a78-3094-4e41-98b5-b1f98f63dc3a" }, { "name": "fields.disk_writes_per_sec_mean", "display": "Disk writes per second", "type": "float", "aggregation": "avg", "guid": "1203a428-0b80-4240-8c1b-f7fa2665ec22" }, { "name": "fields.log_bytes_flushed_mean", "display": "Log bytes flushed", "type": "float", "aggregation": "avg", "guid": "2add293a-446a-468e-91c3-8f164fe697c8" }, { "name": "fields.logical_reads_mean", "display": "Logical reads", "type": "float", "aggregation": "avg", "guid": "7acb184d-c4ce-4287-ab4a-f63f15976bac" }, { "name": "fields.total_sessions_mean", "display": "Total sessions", "type": "int", "aggregation": "avg", "guid": "30c71236-48ad-4fa8-b282-f587ffe8b4b9" }, { "name": "fields.waits_total_mean", "display": "Waits total", "type": "float", "aggregation": "avg", "guid": "1f2a5a97-338b-4a92-8846-f27cd9f6b71f" } ], "name_map": "tags.instance" } ], "skip_merge": true }

you get the following output

[ { "Node": { "Name": "Database1", "SystemName": "testmachine1", "AgentId": "telegraf.waitstats", "AtomId": "Database1" }, "NodeType": { "Id": "16e41dba-e6a4-4427-b753-cf6eee41e384", "Name": "telegraf.waitstats.waits", "DisplayName": "Database waits" }, "Stats": [ { "Name": "fields.disk_reads_mean", "Type": "float", "Guid": "9ed41a78-3094-4e41-98b5-b1f98f63dc3a", "Aggregation": "avg", "Display": "Disk reads", "Value": 1.0121461361787716, "Timestamp": 1740047760 }, { "Name": "fields.disk_writes_per_sec_mean", "Type": "float", "Guid": "1203a428-0b80-4240-8c1b-f7fa2665ec22", "Aggregation": "avg", "Display": "Disk writes per second", "Value": 0.8571885607181682, "Timestamp": 1740047760 }, { "Name": "fields.log_bytes_flushed_mean", "Type": "float", "Guid": "2add293a-446a-468e-91c3-8f164fe697c8", "Aggregation": "avg", "Display": "Log bytes flushed", "Value": 3.9596305549082618, "Timestamp": 1740047760 }, { "Name": "fields.logical_reads_mean", "Type": "float", "Guid": "7acb184d-c4ce-4287-ab4a-f63f15976bac", "Aggregation": "avg", "Display": "Logical reads", "Value": 914.9966524003087, "Timestamp": 1740047760 }, { "Name": "fields.total_sessions_mean", "Type": "int", "Guid": "30c71236-48ad-4fa8-b282-f587ffe8b4b9", "Aggregation": "avg", "Display": "Total sessions", "Value": 14, "Timestamp": 1740047760 }, { "Name": "fields.waits_total_mean", "Type": "float", "Guid": "1f2a5a97-338b-4a92-8846-f27cd9f6b71f", "Aggregation": "avg", "Display": "Waits total", "Value": 7.3, "Timestamp": 1740047760 } ] } ]

Question: how would the mapping for the following input look?

{ "metrics": [ { "fields": { "ifInOctets": 2534174524, "ifOutOctets": 3408455650 }, "name": "Octets_field", "tags": { "agent_host": "192.168.88.54", "host": "MacBookPro.bambam", "hostname": "US-8-150W", "ifName": "0/3" }, "timestamp": 1741006560 } ] }

 

The LLM then produced the mapping below

{ "selector": [ "tags.host", "tags.ifName", "timestamp" ], "type": "telegraf.ifstats", "id_selector": "tags.ifName", "root": "metrics", "system": "tags.host", "mappings": [ { "type": "telegraf.ifstats.octets", "guid": "b5d4d7c9-8f52-4df5-8101-ec5c63a8a123", "name": "Interface Octets", "timestamp": "timestamp", "fields": [ { "name": "fields.ifInOctets", "display": "Inbound Octets", "type": "int", "aggregation": "avg", "guid": "8f1c9de3-91ef-4a34-bc76-e2d05d2a6e51" }, { "name": "fields.ifOutOctets", "display": "Outbound Octets", "type": "int", "aggregation": "avg", "guid": "d10f4182-fc7c-4d65-a8c5-b72a2e6ec351" } ], "name_map": "tags.ifName" } ], "skip_merge": true }

 

We take this mapping and the input I wanted to map, and test it towards our API to validate the mapping

[ { "Node": { "Name": "3/26", "SystemName": "DESKTOP-S01F7CP", "AgentId": "telegraf.ifstats", "AtomId": "3/26" }, "NodeType": { "Id": "b5d4d7c9-8f52-4df5-8101-ec5c63a8a123", "Name": "telegraf.ifstats.octets", "DisplayName": "Interface Octets" }, "Stats": [ { "Name": "fields.ifInOctets", "Type": "int", "Guid": "8f1c9de3-91ef-4a34-bc76-e2d05d2a6e51", "Aggregation": "avg", "Display": "Inbound Octets", "Timestamp": 1741167360 }, { "Name": "fields.ifOutOctets", "Type": "int", "Guid": "d10f4182-fc7c-4d65-a8c5-b72a2e6ec351", "Aggregation": "avg", "Display": "Outbound Octets", "Timestamp": 1741167360 } ] }, { "Node": { "Name": "3/11", "SystemName": "DESKTOP-S01F7CP", "AgentId": "telegraf.ifstats", "AtomId": "3/11" }, "NodeType": { "Id": "b5d4d7c9-8f52-4df5-8101-ec5c63a8a123", "Name": "telegraf.ifstats.octets", "DisplayName": "Interface Octets" }, "Stats": [ { "Name": "fields.ifInOctets", "Type": "int", "Guid": "8f1c9de3-91ef-4a34-bc76-e2d05d2a6e51", "Aggregation": "avg", "Display": "Inbound Octets", "Timestamp": 1741167360 }, { "Name": "fields.ifOutOctets", "Type": "int", "Guid": "d10f4182-fc7c-4d65-a8c5-b72a2e6ec351", "Aggregation": "avg", "Display": "Outbound Octets", "Timestamp": 1741167360 } ] }, { "Node": { "Name": "3/19", "SystemName": "DESKTOP-S01F7CP", "AgentId": "telegraf.ifstats", "AtomId": "3/19" }, "NodeType": { "Id": "b5d4d7c9-8f52-4df5-8101-ec5c63a8a123", "Name": "telegraf.ifstats.octets", "DisplayName": "Interface Octets" }, "Stats": [ { "Name": "fields.ifInOctets", "Type": "int", "Guid": "8f1c9de3-91ef-4a34-bc76-e2d05d2a6e51", "Aggregation": "avg", "Display": "Inbound Octets", "Timestamp": 1741167360 }, { "Name": "fields.ifOutOctets", "Type": "int", "Guid": "d10f4182-fc7c-4d65-a8c5-b72a2e6ec351", "Aggregation": "avg", "Display": "Outbound Octets", "Timestamp": 1741167360 } ] } ]

If you compare this result to the mapping example 2, you notice that the mapping creates a new node for every switch port with corresponding metrics, instead of a single node with all metrics. This is because we asked the LLM to use the mapping from example 1