Datadog Events and Metrics Dashboard Integrations
In addition to the integrations described below, Zebrium also provides a custom Datadog Dashboard Widget. Select Integrations in your Datadog user interface and search for Zebrium for more details. For more information, contact Zebrium at support@zebrium.com.
Features
- You can configure Zebrium to automatically add Root Cause (RCA) reports as Events in Datadog. This allows you to see details of root cause on any Datadog dashboard.
- This integration automatically adds Log Count metrics in Datadog.
- Each Zebrium RCA report includes a summary, a word cloud, and a set of log events showing symptoms and root cause, plus a link to the full report in the Zebrium user interface.
- This means faster Mean Time to Resolution (MTTR) and less time manually hunting for root cause.
How It Works
The recommended mode of operation for observability dashboard integrations is to use the Zebrium Auto-Detect mode as an accurate mechanism for explaining the reason something went wrong. In this mode, you continue to use your existing rules, alerts and metrics as the primary source of problem detection. You can then review Zebrium RCA report findings directly in your Datadog Dashboards, alongside other metrics to explain the reason behind problems you were alerted on.
The Zebrium Augment mode is useful when you have monitors defined in Datadog and you want a Root Cause report automatically generated at the time of the alert. In this mode, Zebrium uses a Datadog webhook as a notification channel, and it updates your Dashboard with Root Cause reports that coincide with the triggering monitor so the reports are immediately visible to you as you work the issue.
The two modes of operation are independent. You can configure Auto-Detect and/or Augment modes depending on your operational use case.
Auto-Detect (recommended): Send Root Cause Detections to your Datadog Dashboards
- Zebrium continuously monitors all application logs and uses unsupervised machine learning to find anomalous log patterns that indicate a problem. These are automatically turned into Root Cause reports that highlight details of any problems, with over 95% accuracy.
- Root Cause report summaries are sent to Datadog using the event API, and Root Cause details are visible on your Datadog Dashboards.
- With a single click on your Dashboard, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.
- Log metrics are also sent to Datadog via the series API for visualization on your Datadog Dashboards.
For details, see Sending Root Cause Detections to your Datadog Dashboards
Augment (advanced users): Receive Signals from Datadog Triggered Monitors
- Any Datadog Monitor can trigger a webhook request for Root Cause Analysis from Zebrium.
- Zebrium finds anomalous log patterns from your application that coincide with the event and creates a Root Cause report.
- Root Cause report summaries are sent to Datadog using the event API and Root Cause details are visible on your Datadog Dashboards.
- With a single click on your Dashboard, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.
For details, see Receiving Signals from Datadog Triggered Monitors
Sending Root Cause Detections to your Datadog Dashboards
STEP 1: Create an API Key in Datadog
- From the Main Navigation panel in Datadog, hover over your Datadog Login Name and select Organization Settings.
- Click API Keys.
- Click the button.
- Enter a Name for the API Key and click Create Key.
- Copy and save the Key for use in STEP 2, below.
STEP 2: Create a Datadog Integration in Zebrium to Send Suggestions to Datadog
- In the Zebrium user interface, go to the Integrations & Collectors page (Settings (
) > Integrations & Collectors).
- In the Observability Dashboards section, click the button.
- Click Create Datadog Dashboard dialog appears. . The
- On the Integration Name for this integration. tab, enter an
- In the Deployment drop-down, select a deployment for the integration.
- In the Service Group(s) drop-down, select a service group for the integration.
- On the tab, click .
- In the API Key field, enter the API key you created in STEP 1, above.
- Click .
STEP 3: Add Zebrium Root Cause Report Suggestions and Log Count Metrics to Your Datadog Dashboards
Zebrium sends events and metrics to Datadog as follows:
- Events are sent each time a Zebrium Root Cause report suggestion occurs.
- Metrics are sent for counts of all log events, error log events, and anomaly log events.
Visualizing Zebrium Data in Datadog
The following image displays a sample chart visualization showing:
-
A Root Cause Finder panel that displays a vertical bar whenever a Zebrium detection occurs. This allows you to easily see detections that are aligned with other metrics on your dashboards.
-
A Root Cause Reports Summary panel that list summary information for each Zebrium detection.
The following image displays the definition of the Root Cause Finder panel:
The following image displays the definition of the Root Cause Reports Summary panel:
Important Metric Names
Metric Name | Description |
---|---|
zebrium.logs.all.count |
Count of all log events received in a one-minute duration (per service_group and deployment). |
zebrium.logs.anomalies.count |
Count of anomaly log events received in a one-minute duration (per service_group and deployment). |
zebrium.logs.errors.count |
Count of error log events received in a one-minute duration (per service_group and deployment). |
ze_service_group |
Zebrium service group name for the corresponding metric or event. |
ze_deployment |
Zebrium deployment name for the corresponding metric or event. |
ze_significance |
Significance of the Root Cause Report (low, medium or high). |
Receiving Signals from Datadog Triggered Monitors
Integration Overview
- Create an API Key in Datadog.
- Create a Datadog integration in Zebrium using the information from step 1.
- Create a webhook integration in Datadog using the information from step 2.
- Add webhook notifications to your Triggered Monitors in Datadog.
- Add Zebrium Root Cause reports to your Datadog Dashboard.
Integration Details
STEP 1: Create an API Key in Datadog
- From the Main Navigation panel in Datadog, hover over your Datadog Login Name and select Organization Settings.
- Click API Keys.
- Click the button.
- Enter a Name for the API Key and click Create Key.
- Copy and save the Key for use in STEP 2, below.
STEP 2: Create a Datadog Integration in Zebrium to Receive Signals from Datadog
- In the Zebrium user interface, go to the Integrations & Collectors page (Settings (
) > Integrations & Collectors).
- In the Observability Dashboards section, click the button.
- Click Create Datadog Dashboard dialog appears. . The
- On the Integration Name for this integration. tab, enter an
- In the Deployment drop-down, select a deployment for the integration.
- In the Service Group(s) drop-down, select a service group for the integration.
- Go to the tab.
- In the API Key field, enter the API key you created in STEP 1, above.
- Click Datadog Dashboard Integrations dialog appears. . The
- Click the Edit button (
) for the integration you just created. The Edit Datadog Dashboard dialog appears.
- On the tab, click the button.
- Make sure that the value in the API Key field on this tab matches the key created in STEP 1, above.
- Click in the URL field to copy the webhook URL and save it for use in STEP 3, below. Click .
- Click .
STEP 3: Create a Webhook Integration in Datadog
- In the Datadog user interface, go to the Main Navigation panel and navigate to Integrations > Integrations.
- Locate the Webhooks integration card and click .
- Click the button located in the Webhooks section
- Enter a Name and the webhook URL that you saved in STEP 2.
- In the Payload section, add the following: "alert_transition": "$ALERT_TRANSITION" after "event_type": "$EVENT_TYPE",
- Click .
STEP 4: Add Webhook notifications to your Triggered Monitors in Datadog
- In the Datadog user interface, go to the Main Navigation panel and navigate to Monitors > Manage Monitors.
- Click on the Monitor you wish to trigger Root Cause reports.
- Choose Edit from the gear icon on the Monitor page.
- Add the webhook URL from STEP 2 in the "Notify your team" list.
- Click .
STEP 5: Add Zebrium Root Cause Report Suggestions to your Datadog Dashboards
Zebrium sends events to Datadog each time a Zebrium Root Cause report suggestion occurs.
For more information, see Visualizing Zebrium Data in Datadog.
Important Metric Names
Metric Name | Description |
---|---|
zebrium.logs.all.count |
Count of all log events received in a one-minute duration (per service_group and deployment). |
zebrium.logs.anomalies.count |
Count of anomaly log events received in a one-minute duration (per service_group and deployment). |
zebrium.logs.errors.count |
Count of error log events received in a one-minute duration (per service_group and deployment). |
ze_service_group |
Zebrium service group name for the corresponding metric or event. |
ze_deployment |
Zebrium deployment name for the corresponding metric or event. |
ze_significance |
Significance of the Root Cause Report (low, medium or high). |