Getting Started with Skylar Automated RCA

Download this manual as a PDF file

This chapter provides an overview of how Skylar Automated RCA works, and how to get started using Skylar Automated RCA.

Before you can start watching for suggestions and reviewing Root Cause reports, you will need to configure a method for gathering log data to send to Skylar Automated RCA. For more information, see Log Collectors and File Uploads.

How Skylar Automated RCA Works

The following steps describe the basic workflow forSkylar Automated RCA"

  1. Data Ingestion. Skylar Automated RCA continuously collects log data from across the IT environment, aggregating messages from applications, infrastructure components, and other sources.
  2. Machine Learning Analysis. The platform uses unsupervised machine learning to detect patterns and anomalies in log data, correlating anomalous events across different log streams that indicate underlying issues. It identifies clusters of problematic events related to the same incident, evaluating them by rarity and severity, such as the number of warnings or errors. Next, the platform assigns a unique "fingerprint" to each cluster, categorizing them as distinct issues.
  3. Root Cause Identification. The Skylar AI engine then examines these patterns to determine the root cause of incidents. Skylar uses historical data and real-time analysis to detect abnormalities that might signal problems, and flags events exceeding a specified threshold.
  4. Automated Recommendations. After an issue is identified, Skylar AI generates alerts and provides a summary of the findings and suggested remediation actions. Logs not included in alerts are discarded after a few hours, ensuring efficient log management.

Skylar Automated RCA Suggestions

When the Skylar AI detects an "abnormal" cluster of problematic events, it generates a suggestion, which appears on the Alerts page (the home page) of the Skylar Automated RCA user interface along with the existing alerts:

On the Alerts page, the summary report for a suggestion and an alert contains the following main elements:

  • AI-generated title. Displaying at the top of the summary pane, this title is generated using GPT Services that use new Generative AI models. You can enable or disable GPT services for a specific deployment of Skylar Automated RCA by using the GPT Services column on the Deployments page (Settings ()> Deployments).
  • Word Cloud. A set of relevant words chosen by the Skylar AI from the log lines contained in the alert. On the RCA report page, you can click a word in the cloud to highlight that word in the list of logs.
  • Significance icon. Since not all suggestions that the Skylar AI generates will relate to problems that actually impact users, the engine attempts to reason over the data and assess whether a problem actually requires attention. Hover over this icon at the top of the list of logs to view the confidence level of the Skylar AI for this suggestion:
    • A red icon () means "High" confidence.
    • A yellow icon () means "Medium" confidence.
    • A blue icon () means "Low" confidence.
  • AI Assessment . Since not all suggestions that the Skylar AI generates will relate to problems that actually impact users, the Skylar AI attempts to reason over the data and assess whether a problem actually requires attention. Depending on the quality of the data, some suggestions might not include an AI Assessment. This value is shown in the Skylar Automated RCA user interface as an AI Assessment value of one of the following:
    • "Your Attention Needed" for content that the Skylar AI believes should be looked into.
    • "No Attention Needed" for content that the Skylar AI assesses as unlikely to require immediate attention.
  • Root Cause (RCA) Report Summary. The report contains the actual cluster of anomalous log lines that was identified by the Skylar AI. Up to eight of these log lines are shown in the summary view. You can click anywhere in the summary to view the full Root Cause report.
  • Alert Key. One or two log lines, denoted with a key icon (), that are used to identify the suggestion if this type of suggestion occurs again. The alert keys make up an alert rule.

You can click anywhere in the summary report for a suggestion or an alert to view a more detailed Root Cause Report page for that suggestion or alert. For more information, see Root Cause Reports.

Suggestions are generated when the Skylar AI finds a cluster of correlated anomalies in your logs that resembles a problem. However, this does not mean that all suggestions relate to actual important problems. This is especially true during the first few days of using Skylar Automated RCA, as the Skylar AI learns the normal patterns in your logs.

When you start getting suggestions on the Alerts page, you can review the word clouds and event logs that display in the summary views for the Root Cause reports for the suggestions. As a best practice, identify a specific time frame when a possible problem occurred, and then start looking at the reports that have the most interesting or relevant information related to the possible root cause of the problem.

You can choose to "accept" or "reject" a suggestion. For more information, see Assessing Suggestions.

You can also decide on the action to take if the same kind of alert type occurs again, such as sending a notification to Slack, email, or another type of notification. For more information, see Notification Channels.

If you currently use SL1 from ScienceLogic, you can configure an integration that lets you view Skylar Automated RCA suggestions in SL1 dashboards as well as on the SL1 Events page. For more information, see ScienceLogic Integrations.

Consuming Root Cause Reports

You can consume the Skylar AI-generated Root Cause reports in one of the following ways:

  1. Recommended. Connect Skylar Automated RCA to a ScienceLogic integration, such as the SL1 Enhanced (12.x) integration on the Integrations & Collectors page (Settings () > Integrations & Collectors). After you configure the integration, data from the Root Cause reports from Skylar Automated RCA will display in SL1 and you can correlate the reports with any spikes or alerts occurring at the same time. For more information, see ScienceLogic Integrations.

    For more details, or to take action on one of these reports, click the URL to go directly to the detailed Root Cause report in the Skylar Automated RCA user interface. For more information, see Working with Suggestions and Root Cause Reports.

  2. Connect Skylar Automated RCA to your incident management tool, such as Opsgenie, PagerDuty, or Slack. After you configure the incident management tool, an RCA report is automatically created and sent back to the incident management tool.

  3. Evaluate the feed of auto-detected incident Root Cause reports on the Alerts page in the Skylar Automated RCA user interface, particularly around times where you know things went wrong. You can also force the Skylar AI to do a deep scan and create a report on demand by clicking the Scan for RC button on the Settings menu (). Any Root Cause reports generated by that scan include a lightning bolt icon and the text "Result of RC Scan". For more information, see Working with Suggestions and Root Cause Reports.

Customizing Your Skylar Automated RCA Results

You can customize your Skylar Automated RCA results on the Alerts page (the Skylar Automated RCA home page) by selecting one or more filters at the top of the page. You can use these filters to manage the number of suggestions and alerts that display on the Alerts page.

For example, by default only the First occurrence of each incident type is visible on dashboards and alert channel, unless you create filters that specify that the incident deserves an alert or suggestion.

You can also filter the list of suggestions by Significance: the Skylar AI assigns a value of Low, Medium, or High to each alert. Significance is a cumulative score for each suggestion, based on the rareness and "badness" (log severity level) of the log events within that alert. If you have a high Significance setting, the Root Cause events will have to be more rare and more "bad" to show up in the list of suggestions.

By default, only suggestions with a significance of Medium and High are shown on the Alerts page, so if you want to also see alerts with Low significance, select Low or greater for this filter. You can edit the default Significance setting by editing the Root Cause Significance setting on the Report Settings page (Settings () > Root Cause Settings.

These filters appear on the Selected Filter dialog, which displays when you click the Filtering button () on the Alerts page:

There is also a Search bar at the top of the Alerts page that you can use for text or regular expression (regex) searches, and a toggle for Core Events and All Events.

For more information about filtering, see Using the Filters on the Alerts Page in Skylar Automated RCA.

What does Skylar Automated RCA Do with Your Logs?

As logs are received by Skylar Automated RCA, the Skylar AI automatically structures and categorizes each type of log event. This allows the Skylar AI to identify anomalous log events. Many factors are used for anomaly detection, but the two most important are the rareness and the severity of each log line.

The Skylar AI then looks for abnormal clusters of correlated anomalies across all the logs within a Service Group, also known as a failure domain. These clusters usually occur because of an actual problem.

If the Skylar AI finds one of these clusters, it generates a Suggestion. The suggestion contains a payload that includes the cluster of log lines.

Other than the log events that are contained in alerts, all other log data is discarded after a few hours.