Datadog Events and Metrics Dashboard Integrations

Download this manual as a PDF file

In addition to the integrations described below, Zebrium also provides a custom Datadog Dashboard Widget. Select Integrations in your Datadog user interface and search for Zebrium for more details. For more information, contact Zebrium at support@zebrium.com.

Features

  • You can configure Zebrium to automatically add Root Cause (RCA) reports as Events in Datadog. This allows you to see details of root cause on any Datadog dashboard.
  • This integration automatically adds Log Count metrics in Datadog.
  • Each Zebrium RCA report includes a summary, a word cloud, and a set of log events showing symptoms and root cause, plus a link to the full report in the Zebrium user interface.
  • This means faster Mean Time to Resolution (MTTR) and less time manually hunting for root cause.

How It Works

The recommended mode of operation for observability dashboard integrations is to use the Zebrium Auto-Detect mode as an accurate mechanism for explaining the reason something went wrong. In this mode, you continue to use your existing rules, alerts and metrics as the primary source of problem detection. You can then review Zebrium RCA report findings directly in your Datadog Dashboards, alongside other metrics to explain the reason behind problems you were alerted on.

The Zebrium Augment mode is useful when you have monitors defined in Datadog and you want a Root Cause report automatically generated at the time of the alert. In this mode, Zebrium uses a Datadog webhook as a notification channel, and it updates your Dashboard with Root Cause reports that coincide with the triggering monitor so the reports are immediately visible to you as you work the issue.

The two modes of operation are independent. You can configure Auto-Detect and/or Augment modes depending on your operational use case.

Auto-Detect (recommended): Send Root Cause Detections to your Datadog Dashboards

  1. Zebrium continuously monitors all application logs and uses unsupervised machine learning to find anomalous log patterns that indicate a problem. These are automatically turned into Root Cause reports that highlight details of any problems, with over 95% accuracy.
  2. Root Cause report summaries are sent to Datadog using the event API, and Root Cause details are visible on your Datadog Dashboards.
  3. With a single click on your Dashboard, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.
  4. Log metrics are also sent to Datadog via the series API for visualization on your Datadog Dashboards.

For details, see Sending Root Cause Detections to your Datadog Dashboards

Augment (advanced users): Receive Signals from Datadog Triggered Monitors

  1. Any Datadog Monitor can trigger a webhook request for Root Cause Analysis from Zebrium.
  2. Zebrium finds anomalous log patterns from your application that coincide with the event and creates a Root Cause report.
  3. Root Cause report summaries are sent to Datadog using the event API and Root Cause details are visible on your Datadog Dashboards.
  4. With a single click on your Dashboard, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.

For details, see Receiving Signals from Datadog Triggered Monitors

Sending Root Cause Detections to your Datadog Dashboards

STEP 1: Create an API Key in Datadog

  1. From the Main Navigation panel in Datadog, hover over your Datadog Login Name and select Organization Settings.
  2. Click API Keys.
  3. Click the + New Key button.
  4. Enter a Name for the API Key and click Create Key.
  5. Copy and save the Key for use in STEP 2, below.

STEP 2: Create a Datadog Integration in Zebrium to Send Suggestions to Datadog

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Observability Dashboards section, click the Datadog Events and Metrics button.
  3. Click Create a New Integration. The Create Datadog Dashboard dialog appears.
  4. On the General tab, enter an Integration Name for this integration.
  5. In the Deployment drop-down, select a deployment for the integration.
  6. In the Service Group(s) drop-down, select a service group for the integration.
  7. On the Send Detections tab, click Enabled.
  8. In the API Key field, enter the API key you created in STEP 1, above.
  9. Click Save.

STEP 3: Add Zebrium Root Cause Report Suggestions and Log Count Metrics to Your Datadog Dashboards

Zebrium sends events and metrics to Datadog as follows:

  1. Events are sent each time a Zebrium Root Cause report suggestion occurs.
  2. Metrics are sent for counts of all log events, error log events, and anomaly log events.

Visualizing Zebrium Data in Datadog

The following image displays a sample chart visualization showing:

  1. A Root Cause Finder panel that displays a vertical bar whenever a Zebrium detection occurs. This allows you to easily see detections that are aligned with other metrics on your dashboards.

  2. A Root Cause Reports Summary panel that list summary information for each Zebrium detection.

Image of a Datadog dashboard with Root Cause Finder and Root Cause Reports panels

The following image displays the definition of the Root Cause Finder panel:

Image of a Datadog dashboard with Root Cause Finder panel details

The following image displays the definition of the Root Cause Reports Summary panel:

Image of a Datadog dashboard with Root Cause Reports Summary panel details

Important Metric Names

Metric Name Description

zebrium.logs.all.count

Count of all log events received in a one-minute duration (per service_group and deployment).

zebrium.logs.anomalies.count

Count of anomaly log events received in a one-minute duration (per service_group and deployment).

zebrium.logs.errors.count

Count of error log events received in a one-minute duration (per service_group and deployment).

ze_service_group

Zebrium service group name for the corresponding metric or event.

ze_deployment

Zebrium deployment name for the corresponding metric or event.

ze_significance

Significance of the Root Cause Report (low, medium or high).

Receiving Signals from Datadog Triggered Monitors

Integration Overview

  1. Create an API Key in Datadog.
  2. Create a Datadog integration in Zebrium using the information from step 1.
  3. Create a webhook integration in Datadog using the information from step 2.
  4. Add webhook notifications to your Triggered Monitors in Datadog.
  5. Add Zebrium Root Cause reports to your Datadog Dashboard.

Integration Details

STEP 1: Create an API Key in Datadog

  1. From the Main Navigation panel in Datadog, hover over your Datadog Login Name and select Organization Settings.
  2. Click API Keys.
  3. Click the + New Key button.
  4. Enter a Name for the API Key and click Create Key.
  5. Copy and save the Key for use in STEP 2, below.

STEP 2: Create a Datadog Integration in Zebrium to Receive Signals from Datadog

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Observability Dashboards section, click the Datadog Events and Metrics button.
  3. Click Create a New Integration. The Create Datadog Dashboard dialog appears.
  4. On the General tab, enter an Integration Name for this integration.
  5. In the Deployment drop-down, select a deployment for the integration.
  6. In the Service Group(s) drop-down, select a service group for the integration.
  7. Go to the Send Detections tab.
  8. In the API Key field, enter the API key you created in STEP 1, above.
  9. Click Save. The Datadog Dashboard Integrations dialog appears.
  10. Click the Edit button () for the integration you just created. The Edit Datadog Dashboard dialog appears.
  11. On the Receive Signals tab, click the Enabled button.
  12. Make sure that the value in the API Key field on this tab matches the key created in STEP 1, above.
  13. Click in the URL field to copy the webhook URL and save it for use in STEP 3, below. Click OK.
  14. Click Save.

STEP 3: Create a Webhook Integration in Datadog

  1. In the Datadog user interface, go to the Main Navigation panel and navigate to Integrations > Integrations.
  2. Locate the Webhooks integration card and click Configure.
  3. Click the New button located in the Webhooks section
  4. Enter a Name and the webhook URL that you saved in STEP 2.
  5. In the Payload section, add the following: "alert_transition": "$ALERT_TRANSITION" after "event_type": "$EVENT_TYPE",
  6. Click Save.

STEP 4: Add Webhook notifications to your Triggered Monitors in Datadog

  1. In the Datadog user interface, go to the Main Navigation panel and navigate to Monitors > Manage Monitors.
  2. Click on the Monitor you wish to trigger Root Cause reports.
  3. Choose Edit from the gear icon on the Monitor page.
  4. Add the webhook URL from STEP 2 in the "Notify your team" list.
  5. Click Save.

STEP 5: Add Zebrium Root Cause Report Suggestions to your Datadog Dashboards

Zebrium sends events to Datadog each time a Zebrium Root Cause report suggestion occurs.

For more information, see Visualizing Zebrium Data in Datadog.

Important Metric Names

Metric Name Description

zebrium.logs.all.count

Count of all log events received in a one-minute duration (per service_group and deployment).

zebrium.logs.anomalies.count

Count of anomaly log events received in a one-minute duration (per service_group and deployment).

zebrium.logs.errors.count

Count of error log events received in a one-minute duration (per service_group and deployment).

ze_service_group

Zebrium service group name for the corresponding metric or event.

ze_deployment

Zebrium deployment name for the corresponding metric or event.

ze_significance

Significance of the Root Cause Report (low, medium or high).