Opsgenie Incident Management Integrations

Download this manual as a PDF file

Features

  • You can configure Zebrium to automatically add Root Cause (RCA) reports to incidents in Opsgenie. This allows you to see details of root cause and direct the incident to the appropriate team.
  • Each Zebrium RCA report includes a summary, a word cloud, and a set of log events showing symptoms and root cause, plus a link to the full report in the Zebrium user interface.
  • This leads to faster Mean Time to Repair (MTTR) and less time manually hunting for root cause.

How it Works

The recommended mode of operation for incident management integrations is to use the Zebrium Augment mode as an accurate mechanism for explaining the reason something went wrong. In this mode, you continue to use your existing rules as the primary source of problem detection and incident creation. You can then review Zebrium RCA report findings directly in the incident that was created by Opsgenie to explain the reason behind the incident.

The Zebrium Auto-Detect mode is useful when you want to direct all Root Cause reports to Opsgenie for routing and dispositioning. You can also use Auto-Detect mode when you want to send only specific Root Cause reports to Opsgenie after first reviewing them in the Zebrium user interface.

The two modes of operation are independent. You can configure Augment and/or Auto-Detect modes depending on your operational use case.

Augment: Receive Signals from Opsgenie Incidents

  1. Any Opsgenie incident can trigger a webhook request for Root Cause Analysis from Zebrium.
  2. Zebrium finds anomalous log patterns from your application that coincide with the incident and creates a Root Cause report.
  3. Root Cause report summaries are sent to Opsgenie using the notes API, and Root Cause details are visible in your Opsgenie incident.
  4. With a single click on your incident, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.

For details, see Receiving Signals from Opsgenie.

Auto-Detect: Send Root Cause Detections to Opsgenie as Incidents

  1. The Zebrium AI/ML engine continuously monitors all application logs and uses unsupervised machine learning to find anomalous log patterns that indicate a problem. These are automatically turned into Root Cause reports highlighting details of any problems with over 95% accuracy.
  2. Root Cause report summaries are sent to Opsgenie using the webhook interface, and the Root Cause details are visible as incidents in Opsgenie.
  3. With a single click on your incident, you can drill down further into the Zebrium user interface to look at correlated logs across your entire application.

For details, see Sending Root Cause Detections to Opsgenie as Incidents.

Sending Root Cause Detections to Opsgenie as Incidents

This incident management integration automatically sends a Root Cause (RCA) report to Opsgenie so that the appropriate team is notified when the Zebrium AI/ML engine auto-detects an incident .

STEP 1: Add the Zebrium Integration to your Opsgenie Team

  1. In the Opsgenie user interface, click the Teams tab to access your Team dashboard.
  2. Click the desired Team for the integration.
  3. Click the Integrations section from the left-hand navigation pane.
  4. Click the Add integration button.
  5. Click the Add button under the Zebrium integration icon.
  6. Make a note of the Webhook URL in the Zebrium section of the Integration Setup page. You will use this in STEP 2, below.
  7. In the Settings section, update the Name as desired.
  8. Make sure that the Enabled checkbox is selected.
  9. Click Save Integration.

STEP 2: Create an Opsgenie Integration in Zebrium to Send Root Cause Detections to Opsgenie as Incidents

  1. In the Zebrium user interface, go to the Integrations & Collectors page (Settings () > Integrations & Collectors).
  2. In the Incident Management section, click the Opsgenie button.
  3. Click Create a New Integration button. The Create Opsgenie Incident Management dialog appears.
  4. On the General tab, enter an Integration Name for this integration.
  5. In the Deployment drop-down, select a deployment for the integration.
  6. In the Service Group(s) drop-down, select a service group for the integration.
  7. On the Send Detections tab, click Enabled.
  8. Enter the Opsgenie Webhook URL that you created in STEP 1, above.
  9. You can choose to send notifications the first time the AI/ML engine detects a new type of proactive Root Cause report. We recommend setting the Send on 1st occurrence toggle to Yes for proactive notification of potential new problems. If you want to be notified on subsequent occurrences, do this from the relevant Root Cause report.
  10. Click Save.