Elastic Stack Dashboard Integrations

Download this manual as a PDF file

Features

  • You can configure Zebrium to automatically add Root Cause (RCA) reports as Detection metrics in Elastic. This allows you to see details of root cause on any Kibana dashboard.
  • This integration automatically adds log metrics into Elastic.
  • Each Zebrium RCA report includes a summary, a word cloud, and a set of log events showing symptoms and root cause, plus a link to the full report in the Zebrium user interface.
  • This means faster Mean Time to Resolution (MTTR) and less time manually hunting for root cause.

How It Works

The recommended mode of operation for observability dashboard integrations is to use the Zebrium Auto-Detect mode as an accurate mechanism for explaining the reason something went wrong. In this mode, you continue to use your existing rules, alerts and metrics as the primary source of problem detection. You can then review Zebrium RCA report findings directly in your Kibana Dashboards alongside other metrics to explain the reason behind problems you were alerted on.

Auto-Detect: Send Root Cause Detections to your Kibana Dashboards

  1. Zebrium continuously monitors all application logs and uses unsupervised machine learning to find anomalous log patterns that indicate a problem. These are automatically turned into Root Cause reports highlighting details of any problems with over 95% accuracy.
  2. Root Cause report summaries are sent to Elastic via the Zebeat log shipper as metrics and Root Cause details are visible on your Kibana Dashboards.
  3. With a single click from your Dashboard, you can drill down further to look at correlated logs across your entire application.
  4. Log metrics are also sent to Elastic using the Zebeat log shipper for visualization on your Kibana Dashboards.

Sending Root Cause Detections to Your Kibana Dashboards

Integration Overview

  1. Create a secure access token in Zebrium for the Zebeat collector.
  2. Create Zebeat Override File and Deploy in your Kubernetes Environment using helm.
  3. Create a visualization in your Kibana Dashboard using the Root Cause report and log data provided by Zebeat.

Image of the integration for Kibana dashboards

Integration Details

STEP 1: Create a Secure Access Token in Zebrium

  1. In the Zebrium user interface, go to the Access Tokens page (Settings () > Access Tokens).
  2. Click + Add Access Token button.
  3. Enter a Name for the token.
  4. Select Viewer for the Role.
  5. Select the Deployment for the access token.
  6. Click Add.
  7. Copy the Access Token that was just created and save it for use in STEP 2.

STEP 2: Create Zebeat Override File and Deploy in your Kubernetes Environment

Create the Zebeat Override File:

  1. Go to the Zebeat github repository at https://github.com/zebrium/helm-charts/tree/main/charts/zebeat.
  2. Navigate to the examples directory.
  3. Zebeat can send Root Cause report data to Logstash or Elasticsearch directly. Choose one of the logstash or elasticsearch .yaml files as a template for the zebeat override.yaml file you will use when deploying the Zebeat chart.
  4. Copy the contents of the .yaml file template to your local disk as override.yaml so you can customize for your environment.
  5. Edit your local copy of the override.yaml file and make the following updates:
  • In the host parameter of the metricbeat.modules section, add the fully qualified host name (FQHN) for your Zebrium instance where you generated the access token in STEP 1, above. For Zebrium SaaS, this will typically be: https://cloud.zebrium.com.
  • In the access_tokens.yaml parameter of the accessTokens section, add the FQHN for your Zebrium instance and the Access Token generated in STEP 1.
  • In the output.elasticsearch or output.logstash section, add the appropriate host for your Elastic deployment and any necessary credentials.
  • Save the override.yaml file.

Deploy Zebeat in your Kubernetes Environment

To install the chart with the release name zebrium, run the following commands:

helm repo add zebrium http://charts.zebrium.com

helm upgrade -i zebeat zebrium/zebeat --namespace zebrium --create-namespace -f override.yaml

STEP 3: Create Visualizations in your Dashboard

Zebeat provides two metric sets for visualizing Zebrium data in Elastic:

  1. Detections provides Root Cause report data.
  2. Logs provides metrics on Log Event counts.

Visualizing in Kibana

The following image displays a sample chart visualization showing:

  1. The sum Detections from the detections metric set using detections.alwaysone.count plotted as a bar chart with a Y-axis on the right-hand side.
  2. Sum of Anomalies from the logs metricset using logs.anomalies.count plotted as a line chart with a Y-axis on the left-hand side.

Image of a Kibana dashboard with Root Cause Finder panel

Below is a sample Search visualization showing the following Root Cause report details:

  1. detections.title. NLP Summary.
  2. detections.word_cloud.w. List of Word Cloud strings.
  3. detections.report_url. Link for viewing full Root Cause report details in the Zebrium portal.
  4. detections.significance. Significance of the Root Cause analysis determined by Zebrium ML (low, medium, high).
  5. detections.service_group. Service group where Root Cause detection was found.

Image of a Kibana dashboard with Detection Details

Important Metric Names

Metric Name Description

logs.all.count

Count of all log events received in a one-minute duration (per service_group)

logs.anomalies.count

Count of anomaly log events received in a one-minute duration (per service_group)

 

logs.errors.count

Count of error log events received in a one-minute duration (per service_group)

detections.alwaysone.count

Set to 1 each time there is a Zebrium Root Cause report detection

detections.title

Title of the Root Cause report (usually an NLP summary)

detections.word_cloud.w

List of words in the word cloud of the Root Cause report (per service_group)

detections.report_url

URL of the Root Cause report

detections.significance

Significance of the Root Cause report (low, medium or high)

zebrium.service_group

Zebrium service group name for the corresponding metric or detection

Sample Payloads for Detections and Logs Metricsets

Detections Metricset Payload

{
  "_index": ".ds-metricbeat-8.3.0-2022.04.07-000001",
  "_id": "u-aUGYABqSxIAr_l5fTX",
  "_version": 1,
  "_score": 1,
  "_source": {
    "@timestamp": "2022-04-11T16:56:53.000Z",
    "event": {
      "module": "zebrium",
      "duration": 292227850,
      "dataset": "detections"
    },
    "metricset": {
      "name": "detections",
      "period": 10000
    },
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "zebeat-67d8d6457b-8rblk"
    },
    "agent": {
      "type": "metricbeat",
      "version": "8.3.0",
      "ephemeral_id": "5c5a0778-b163-4187-916e-5fc1b730fbde",
      "id": "6c216ce2-16cc-4313-802d-2203a604159c",
      "name": "zebeat-67d8d6457b-8rblk"
    },
    "service": {
      "address": "https://cloud.zebrium.com",
      "type": "zebrium"
    },
    "zebrium": {
      "customer": "xyz16",
      "deployment": "trial",
      "service_group": "shop"
    },
    "detections": {
      "report_url": "https://cloud.zebrium.com:443/root-cause/report?deployment_id=
xyz16_trial&itype_id=0ba3b7a6-5bfb-561a-591b-5324d08b86bd&inci_id=00062545-dd50-0000-
0000-51900000f40e&ievt_level=2",
      "occurrence": {
        "count": 1
      },
      "word_cloud": [
        {
          "w": "mongodb",
          "b": 7,
          "s": 8
        },
        {
          "b": 8,
          "s": 7,
          "w": "sock-chaos-runner"
        },
        {
          "w": "carts",
          "b": 7,
          "s": 7
        },
        {
          "s": 6,
          "w": "exception",
          "b": 6
        },
        {
          "b": 6,
          "s": 3,
          "w": "sock-shop"
        },
        {
          "s": 6,
          "w": "org",
          "b": 5
        },
        {
          "b": 5,
          "s": 5,
          "w": "socket"
        },
        {
          "s": 5,
          "w": "dispatcherservlet",
          "b": 2
        }
      ],
      "alwaysone": {
        "count": 1
      },
      "includes_default": true,
      "title": "The kubelet was unable to create the order due to timeout from one of the services.",
      "significance": "medium"
    }
  },
  "fields": {
    "zebrium.service_group": [
      "shop"
    ],
    "detections.includes_default": [
      true
    ],
    "zebrium.deployment": [
      "trial"
    ],
    "zebrium.customer": [
      "xyz16"
    ],
    "service.type": [
      "zebrium"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "detections.occurrence.count": [
      1
    ],
    "logstash_stats.timestamp": [
      "2022-04-11T16:56:53.000Z"
    ],
    "event.module": [
      "zebrium"
    ],
    "detections.word_cloud.b": [
      7,
      8,
      7,
      6,
      6,
      5,
      5,
      2
    ],
    "agent.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "host.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "beats_state.timestamp": [
      "2022-04-11T16:56:53.000Z"
    ],
    "beats_state.state.host.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "timestamp": [
      "2022-04-11T16:56:53.000Z"
    ],
    "detections.report_url": [
      "https://cloud.zebrium.com:443/root-cause/report?deployment_id=xyz16_trial&itype_id=
0ba3b7a6-5bfb-561a-591b-5324d08b86bd&inci_id=00062545-dd50-0000-0000-51900000f40e&ievt_level=2"
    ],
    "detections.word_cloud.w": [
      "mongodb",
      "sock-chaos-runner",
      "carts",
      "exception",
      "sock-shop",
      "org",
      "socket",
      "dispatcherservlet"
    ],
    "detections.title": [
      "The kubelet was unable to create the order due to timeout from one of the services."
    ],
    "kibana_stats.timestamp": [
      "2022-04-11T16:56:53.000Z"
    ],
    "detections.alwaysone.count": [
      1
    ],
    "metricset.period": [
      10000
    ],
    "detections.word_cloud.s": [
      8,
      7,
      7,
      6,
      3,
      6,
      5,
      5
    ],
    "agent.hostname": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "metricset.name": [
      "detections"
    ],
    "event.duration": [
      292227850
    ],
    "@timestamp": [
      "2022-04-11T16:56:53.000Z"
    ],
    "agent.id": [
      "6c216ce2-16cc-4313-802d-2203a604159c"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "service.address": [
      "https://cloud.zebrium.com"
    ],
    "agent.ephemeral_id": [
      "5c5a0778-b163-4187-916e-5fc1b730fbde"
    ],
    "agent.version": [
      "8.3.0"
    ],
    "event.dataset": [
      "detections"
    ],
    "detections.significance": [
      "medium"
    ]
  }
}

Logs Metricset Payload

{
  "_index": ".ds-metricbeat-8.3.0-2022.04.07-000001",
  "_id": "Xi5MG4ABTsyT1lUpY2dd",
  "_version": 1,
  "_score": 1,
  "_source": {
    "@timestamp": "2022-04-12T00:52:00.000Z",
    "event": {
      "dataset": "logs",
      "module": "zebrium",
      "duration": 144691043
    },
    "metricset": {
      "name": "logs",
      "period": 10000
    },
    "service": {
      "address": "https://cloud.zebrium.com",
      "type": "zebrium"
    },
    "zebrium": {
      "service_group": "default",
      "customer": "xyz16",
      "deployment": "trial"
    },
    "logs": {
      "errors": {
        "count": 0
      },
      "anomalies": {
        "count": 0
      },
      "all": {
        "count": 27
      }
    },
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "zebeat-67d8d6457b-8rblk"
    },
    "agent": {
      "version": "8.3.0",
      "ephemeral_id": "5c5a0778-b163-4187-916e-5fc1b730fbde",
      "id": "6c216ce2-16cc-4313-802d-2203a604159c",
      "name": "zebeat-67d8d6457b-8rblk",
      "type": "metricbeat"
    }
  },
  "fields": {
    "zebrium.service_group": [
      "default"
    ],
    "zebrium.deployment": [
      "trial"
    ],
    "zebrium.customer": [
      "xyz16"
    ],
    "service.type": [
      "zebrium"
    ],
    "agent.type": [
      "metricbeat"
    ],
    "logstash_stats.timestamp": [
      "2022-04-12T00:52:00.000Z"
    ],
    "event.module": [
      "zebrium"
    ],
    "agent.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "host.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "beats_state.timestamp": [
      "2022-04-12T00:52:00.000Z"
    ],
    "logs.anomalies.count": [
      0
    ],
    "beats_state.state.host.name": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "timestamp": [
      "2022-04-12T00:52:00.000Z"
    ],
    "kibana_stats.timestamp": [
      "2022-04-12T00:52:00.000Z"
    ],
    "metricset.period": [
      10000
    ],
    "agent.hostname": [
      "zebeat-67d8d6457b-8rblk"
    ],
    "logs.errors.count": [
      0
    ],
    "metricset.name": [
      "logs"
    ],
    "event.duration": [
      144691043
    ],
    "@timestamp": [
      "2022-04-12T00:52:00.000Z"
    ],
    "agent.id": [
      "6c216ce2-16cc-4313-802d-2203a604159c"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "service.address": [
      "https://cloud.zebrium.com"
    ],
    "agent.ephemeral_id": [
      "5c5a0778-b163-4187-916e-5fc1b730fbde"
    ],
    "agent.version": [
      "8.3.0"
    ],
    "event.dataset": [
      "logs"
    ],
    "logs.all.count": [
      27
    ]
  }
}