PagerDuty Event Management Integrations

Download this manual as a PDF file

Features

  • You can configure Zebrium to automatically add Root Cause (RCA) reports to events in PagerDuty. This allows you to see details of root cause and direct the event to the appropriate team.
  • Each Zebrium RCA report includes a summary, a word cloud, and a set of log events showing symptoms and root cause, plus a link to the full report in the Zebrium user interface.
  • This leads to faster Mean Time to Repair (MTTR) and less time manually hunting for root cause.

How it Works

The recommended mode of operation for event management integrations is to use the Zebrium Augment mode as an accurate mechanism for explaining the reason something went wrong. In this mode, you continue to use your existing rules as the primary source of problem detection and event creation. You can then review Zebrium RCA report findings directly in the event that was created by PagerDuty to explain the reason behind the event.

The Zebrium Auto-Detect mode is useful when you want to direct all Root Cause reports to PagerDuty for routing and dispositioning. You can also use Auto-Detect mode when you want to send only specific Root Cause reports to PagerDuty after first reviewing them in the Zebrium user interface.

The two modes of operation are independent. You can configure Augment and/or Auto-Detect modes depending on your operational use-case.

Augment: Receive Signals from PagerDuty Events

  1. Any PagerDuty event can trigger a webhook request for Root Cause Analysis from Zebrium.
  2. Zebrium finds anomalous log patterns from your application that coincide with the event and creates a Root Cause report.
  3. Root Cause report summaries are sent to PagerDuty using the notes API, and Root Cause details are visible in your PagerDuty Event.
  4. With a single click on your event, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.

For details, see Receiving Signals from PagerDuty.

Auto-Detect: Send Root Cause Detections to PagerDuty as Events

  1. The Zebrium AI/ML engine continuously monitors all application logs and uses unsupervised machine learning to find anomalous log patterns that indicate a problem. These are automatically turned into Root Cause reports highlighting details of any problems with over 95% accuracy.
  2. Root Cause report summaries are sent to PagerDuty using the webhook interface, and the Root Cause details are visible as events in PagerDuty.
  3. With a single click on your event, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.

For details, see Sending Root Cause Detections to PagerDuty as Events.

Receiving Signals from PagerDuty

STEP 1: Configure API Access for Zebrium in PagerDuty

  1. In the PagerDuty user interface, go to the Integrations menu and select API Access.
  2. Click the Create New API Key button.
  3. Enter a description, such as "Zebrium Event Detection".
  4. Make sure that the Read-only API Key option is not selected.
  5. Click Create Key.
  6. Copy the API Key and save it for STEP 2. The key will not be visible in PagerDuty again.

STEP 2: Create a PagerDuty Integration in Zebrium to Receive Signals from PagerDuty

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Event Management section, click the PagerDuty button in the Incident Management section.
  3. Click Create a New Integration button. The Create PagerDuty Event Management dialog appears.
  4. On the General tab, enter an Integration Name for this integration.
  5. In the Deployment drop-down, select a deployment for the integration.
  6. In the Service Group(s) drop-down, select a service group for the integration.
  7. On the Receive Signals tab, click Enabled.
  8. Enter the Username for your PagerDuty portal.
  9. Enter the API Key that you created in STEP 1, above.
  10. Click Save. The Your URL dialog appears.
  11. Copy the Webhook URL and save it for use in STEP 3, below.
  12. Click OK.

STEP 3: Add the Zebrium Webhook to PagerDuty

  1. n the PagerDuty user interface, go to the Integrations menu and select select Generic Webhooks (v3).
  2. Click the + Add New Webhook button.
  3. In the WEBHOOK URL area, paste the Zebrium Webhook URL that was copied in STEP 2 when configuring access for PagerDuty in Zebrium.
  4. In the SCOPE TYPE drop-down, select Service.
  5. In the SCOPE drop-down, select the desired service to which you want to add the Zebrium webhook.
  6. Enter a DESCRIPTION, such as "Zebrium Signal".
  7. In the EVENT SUBSCRIPTION field, select event.triggered. Clear all other checkboxes.
  8. Click the Add Webhook button.

How to Uninstall

Disable API Access in PagerDuty

  1. In the PagerDuty user interface, go to the Integrations menu and select API Access.
  2. Click Disable or Remove on the API Access Key you want to delete.
  3. Click the Save button after confirming you wish to proceed.

Delete the Zebrium Integration

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Event Management section, click the PagerDuty button.
  3. Click the delete icon () next to the Zebrium integration that you want to delete.
  4. Click OK after confirming you wish to proceed.

Sending Root Cause Detections to PagerDuty as Events

This integration automatically sends Root Cause (RCA) reports to PagerDuty so that the appropriate team is notified when the Zebrium AI/ML engine auto-detects an event .

STEP 1: Create an Integration Key in PagerDuty

  1. In the PagerDuty user interface, go to an existing or create a new Event Orchestration or Event Rule under the Automation menu item.
  2. Under Integrations associated with the Event Orchestration or Rule, copy the corresponding Integration Key for STEP 2, below.

STEP 2: Create a PagerDuty Integration in Zebrium

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Incident Management section, click the PagerDuty button.
  3. Click Create a New Integration button. The Create PagerDuty Event Management dialog appears.
  4. On the General tab, enter an Integration Name for this integration.
  5. In the Deployment drop-down, select a deployment for the integration.
  6. In the Service Group(s) drop-down, select a service group for the integration.
  7. On the Send Detections tab, click Enabled. You might need to complete the Receive Signals tab before you can go to the next step. For more information, see Create a PagerDuty Integration in Zebrium to Receive Signals from PagerDuty, above.
  8. In the Integration Key field, paste the Integration Key that you saved from STEP 1, above.
  9. You can choose to send notifications the first time the AI/ML engine detects a new type of proactive Root Cause report. We recommend setting the Send on 1st occurrence toggle to Yes for proactive notification of potential new problems. If you want to be notified on subsequent occurrences, do this from the relevant Root Cause report.
  10. After you update this tab, you can click Create Sample Alert to test your settings. If your settings were correct, a sample alert will display on the Alerts page.
  11. Click Save.