Skylar Analytics: Anomaly Detection

Download this manual as a PDF file

The Anomaly Detection component of Skylar Analytics uses Skylar AI to identify unusual patterns that do not conform to expected behavior. Anomaly Detection provides always-on, unsupervised, machine-learning-based monitoring that automatically identifies unusual patterns in the real-time performance metrics and resource data that it observes.

You can view a list of all devices that have metrics being monitored for anomalies on the corresponding Device Investigator or Service Investigator pages.

Unlike the Data Visualization and Exploration and Predictive Alerting components of Skylar Analytics, this release of Anomaly Detection with Skylar Analytics works on all Dynamic Applications in SL1. Data Visualization and Exploration as well as Predictive Alerting currently monitor server and network metrics, with more metrics planned for future SL1 releases.

What is Anomaly Detection?

Anomaly detection is a technique that uses machine learning to identify unusual patterns that do not conform to expected behavior. Anomaly detection provides always-on, unsupervised machine learning-based monitoring that automatically identifies unusual patterns in the real-time performance metrics and resource data that it observes.

Anomalies do not necessarily represent problems or events to be concerned about; rather, they represent unexpected behavior that might require further investigation.

Unlike the Data Visualization and Exploration and Predictive Alerting components of Skylar Analytics, this release of Anomaly Detection with Skylar Analytics works on all Dynamic Applications in SL1. Data Visualization and Exploration as well as Predictive Alerting currently monitor server and network interface metrics, with more metrics planned for future SL1 releases.

Anomaly detection is calculated and displayed in the SL1 user interface for all Performance Dynamic Applications. This detection is enabled by default and cannot be disabled. You can control which device data gets sent to Skylar for analysis based on the organization aligned with the device or devices. All devices in the selected organization will get anomaly detection analysis.

For more information, see Enabling Skylar Analytics for One or More SL1 Organizations.

Enabling Anomaly Detection Events for Specific Metrics

You can set up anomaly detection events for specific metrics for devices and business services so that event policies are triggered when an anomaly is detected for that metric.

Enabling Anomaly Detection Events for a Metric on the Device Investigator Page

To enable anomaly detection events for a metric on the Device Investigator page: 

  1. On the Devices page (), click the Device Name for the device on which you want to enable anomaly detection events. The Anomaly Detection tab for Device Investigator displays.

    If the Anomaly Detection tab does not already appear on the Device Investigator, click the More drop-down menu and select it from the list of tab options.

  2. On the Anomaly Detection tab, click the Actions icon () for any of the listed metrics and select Enable. The Select Available Metrics modal appears.

  3. In the Select Metric drop-down, use the Search field to search for a specific metric or click one of the category names, such as "Dynamic Apps" or "Collection Labels", to view a list of available metrics for that metric category.

  4. Click the name of the metric on which you want to enable anomaly detection events for the device.

  5. For some metrics, a second drop-down field might display that enables you to specify the device directory. If this field appears, click the name of the directory on which you want to enable anomaly detection.

  6. Click Enable. That metric is enabled for events for that device.

To disable anomaly detection events for a metric, click the Actions icon () for that metric and select Disable.

Enabling Anomaly Detection Events for a Metric on the Service Investigator Page

On the Anomaly Detection tab on a Service Investigator page, you can enable anomaly detection events for additional metrics or disable anomaly detection metric events on which it is currently enabled.

The Anomaly Detection tab appears only if you have at least one device in the selected service that has anomaly detection enabled.

To enable anomaly detection events for a metric on the Service Investigator page:

  1. On the Business Services page (), select a service from the list of business, IT, and device services by clicking its name. The Service Investigator displays.
  2. On the Service Investigator page, click the Anomaly Detection tab.
  3. Click the Actions icon () for any of the listed metrics and select Enable. The Select Available Metrics modal appears.
  4. In the Select Metric drop-down, use the Search field to search for a specific metric or click one of the category names, such as "Dynamic Apps" or "Collection Labels", to view a list of available metrics for that metric category.
  5. Click the name of the metric on which you want to enable anomaly detection events for the device.
  6. For some metrics, a second drop-down field might display that enables you to specify the device directory. If this field appears, click the name of the directory on which you want to enable anomaly detection .
  7. Click Enable.

To disable anomaly detection for a metric, click the Actions icon () for that metric and select Disable. The metric is removed from the Anomaly Detection tab.

Viewing Graphs and Data for Anomaly Detection

After SL1 begins performing anomaly detection for a device, you can view graphs and data about each anomaly. Graphs for anomalies appear on the following pages in SL1:

  • The Anomaly Detection tab in the Device Investigator.

  • The Anomalies tab in the Service Investigator for a business, IT, or device service.

You can view the anomaly detection graphs for the metrics by clicking the Open icon () next to the metric for the device. The Anomaly Chart modal appears, displaying the "Anomaly Score" chart above the chart for the specified metric you are monitoring.

The "Anomaly Score" chart displays a graph of values from 0 to 100 that represent how far the real data for a metric diverges from its normal patterns. The lines in the chart are color-coded by the severity level of the event that gets triggered as the data diverges further. The anomaly score is basically a running sum over a small window of time, so after anomalies stop, the score will drop to zero over that time.

The second graph displays the following data:

  • A blue band representing the range of probable values that SL1 expected for the device metric.
  • A green line representing the actual value for the device metric.
  • A red dot indicating anomalies where the actual value appears outside of the expected value range.

You can hover over a value in one of the charts to see a pop-up box with the Expected Range and the metric value. The Anomaly Score value also displays in the pop-up box, with the severity in parentheses: Normal, Low, Medium, High, or Very High.

You can zoom in on a shorter time frame by clicking and dragging your mouse over the part of the chart representing that time frame, and you can return to the original time span by clicking the Reset zoom button.

You can define the thresholds for the "Anomaly Score" chart on the Anomaly Chart modal, and whether those values generate alerts, on the Anomaly Detection Thresholds page.

You can view the alert levels when you hover over a value in one of the charts on the Anomaly Chart modal. The Anomaly Score severity level displays after the index value, in parentheses: Normal, Low, Medium, High, or Very High:

An Anomaly Score severity level of Normal is assigned to a value in the chart that is lower than the lowest enabled alert level. For example, if the threshold for the Low severity is enabled and set to 20 or higher, an Anomaly Score of 16 would have a severity level of Normal.

To edit the Anomaly Score thresholds:

  1. On the Anomaly Detection Thresholds page, click Edit.
  2. For each of the four severity levels, from Low to Very High, you can select Enabled to have SL1 generate an alert when the Anomaly  value for a metric is equal to or greater than the threshold for that severity level.
  3. You can edit the threshold value for each level if SL1 is generating too many (or not enough) anomalies of a certain severity level.
  4. For example, if you want to enable a Low level alert when the Anomaly Score value is between 25 and 39, you would go to the Low panel, select Enabled, and update the value from "20" to "25".
  5. Click Save.
  6. You can then edit an event policy that uses alerts based on the settings on this page to generate events in SL1. For more information, see Creating an Event Policy for Anomalies.

Creating an Event Policy for Anomalies

After you have enabled anomaly detection for devices, you can create additional event policies that will trigger events in SL1 when anomalies are detected for those devices.

Because anomalies do not always correspond to problems, ScienceLogic recommends creating an event policy only for scenarios where anomalies appear to be correlated with some other behavior that you cannot otherwise track using an event or alert.

Because the anomaly detection model is constantly being refined as SL1 collects more data, you might experience a larger number of anomaly-related events if you create an event policy for anomalies soon after enabling anomaly detection compared to if you were to do so after SL1 has had an opportunity to learn more about the device metric's data patterns.

To create an event policy for anomalies:

  1. Go to the Event Policies page (Events > Event Policies).
  2. On the Event Policies page, click the Create Event Policy button. The Event Policy Editor page appears.
  3. In the Policy Name field, type a name for the new event policy.
  4. Click the Match Logic tab.
  5. In the Event Source field, select Internal.
  6. In the Match Criteria field, click the Select Link-Message button.
  7. In the Link-Message modal page, search for "Anomaly" to locate the message "Anomaly Detected: %V":

  1. Click the radio button for the message "Anomaly Detected: %V", and then click Select.
  2. Complete the remaining fields and tabs in the Event Policy Editor based on the specific parameters that you want to establish for the event. For more information about the fields and tabs in the Event Policy Editor, see Defining an Event Policy.
  3. To enable the event policy, click the Enable Event Policy toggle so that it is in the "on" position.
  4. When you are finished entering all of the necessary information into the event policy, click Save.

Using Anomaly-related Events to Trigger Automated Run Book Actions

SL1 includes automation features that allow you to define specific event conditions and the actions you want SL1 to execute when those event conditions are met. You can use these features to trigger automated run book actions whenever an anomaly-related event is generated in SL1.

To use anomaly-related events to trigger automated run book actions:

  1. Go to the Automation Policy Manager page (Registry > Run Book > Automation).

  2. Click the Create button. The Automation Policy Editor page appears:

  1. In the Policy State field, select Enabled.
  2. In the Available Events field, search for and select an anomaly-related event policy, and then click the right-arrow icon to move it to the Aligned Events field. For more information about anomaly-related events, see Creating an Event Policy for Anomalies.
  3. Complete the remaining fields on the Automation Policy Editor page based on the specific parameters that you want to establish for the automation policy. For more information about the fields on the Automation Policy Editor page, see Automation Policies.
  4. When you are finished, click Save.