Enabling Machine Learning-based Anomaly Detection

Download this manual as a PDF file

This section describes how to enable machine learning-based anomaly detection in SL1, as well as how to view recent anomalies for devices and business services.

To use machine learning-based anomaly detection on the SL1 Extended Architecture, you must enable the Collector Pipeline to collect data from Performance Dynamic Applications. For more information, see the section on Enabling the Collector Pipeline.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

Viewing the List of Devices that Have Anomaly Detection Enabled

The Machine Learning page displays a list of devices that are currently using machine learning for anomaly detection, as well as devices for which you can enable anomaly detection if it is not enabled already.

To navigate to the Machine Learning page, click the Machine Learning icon ():

To filter the devices that appear on the page based on whether anomaly detection is enabled or disabled, type "MachineLearningPolicy.enabled" in the Search field. A "MachineLearningPolicy.enabled" pill appears below the Search field. Click that pill and then select True to filter the page to display only those devices on which anomaly detection is enabled, or select False to filter the page to display only those devices on which anomaly detection is disabled.

You can filter the items on this inventory page by typing filter text or selecting filter options in one or more of the filters found above the columns on the page. For more information, see Filtering Inventory Pages.

For each device in the list, the Machine Learning page displays the following information:

  • Device Name. Displays the name of the device. Click the hyperlink to go to the Machine Learning tab of the Device Investigator page for that device. Each row in the list represents a specific device and metric; therefore, a device might appear in the list multiple times if anomaly detection is enabled for multiple metrics on that device.
  • Anomaly Detection. Indicates the build status for the for the metric that SL1 is evaluating for anomalies on the device. Possible values include:
  • Disabled. Anomaly detection is disabled for the metric.
  • Enabled. Anomaly detection is enabled for the metric.
  • Queued. The metric has been selected for anomaly detection, but SL1 has not yet begun building the anomaly detection model for that metric.
  • Building. SL1 is building the anomaly detection model that is specific to the selected device and metric.
  • Failed. The anomaly detection model build process failed.
  • Failed (building). SL1 could not find any winners for a predictor for this metric, and SL1 will continue to monitor for anomalies on that device.
  • Waiting for Data. Anomaly detection for this metric lacks sufficient data, either because detection needs at least one day of monitoring or the data for this metric is irregular.
  • Metric Type. Indicates the metric that SL1 is evaluating for anomalies on the device.
  • ML Enabled By User. Indicates the username of the user that enabled anomaly detection for the device and metric.
  • Class. Displays the Device Class for the device.
  • Category. Displays the device's Device Category.
  • Anomaly Count. Displays the number of anomalies detected by SL1.

Enabling Machine Learning on One or More Devices

For SL1 to collect and analyze data for the sake of detecting anomalies for a specific metric on a particular device, you must first enable machine learning on that device. You can do that from several different places within SL1.

The following sections describe each of these methods.

Enabling Machine Learning from the Machine Learning Page

To enable machine learning for one or more devices from the Machine Learning page:

  1. Click the Machine Learning icon (). The Machine Learning page displays.
  2. Locate the device on which you want to enable machine learning. To sort by devices that do not have anomaly detection enabled, select Disabled in the Anomaly Detection column.
  3. Click the Actions icon () for that device and select Enable. Alternatively, you can select the checkbox for one or more devices, and then click Enable at the top of the page. The Select Metric to Enable Machine Learning modal page appears.
  4. In the Select Metric drop-down, use the Search field to search for a specific metric or click one of the category names, such as "Dynamic Apps" or "Collection Labels", to view a list of available metrics for that metric category.
  5. Click the name of the metric on which you want to enable machine learning for the device.
  6. For some metrics, a second drop-down field might display that enables you to specify the device directory. If this field appears, click the name of the directory on which you want to enable machine learning.
  7. Click Enable. That metric is enabled for the device, and the metric is listed in the Metric Type column on the Machine Learning page.

Enabling Machine Learning in the Device Investigator

To enable machine learning for a device in the Device Investigator

  1. On the Devices page (), click the Device Name for the device on which you want to enable anomaly detection. The Device Investigator displays.

  2. Click the Machine Learning tab.

    If the Machine Learning tab does not already appear on the Device Investigator, click the More drop-down menu and select Machine Learning from the list of tab options.

  3. On the Machine Learning tab, click the Add ML Metric button or click the Actions icon () for any of the listed metrics and select Enable. The Select Metric to Enable Machine Learning modal page appears.

  4. In the Select Metric drop-down, use the Search field to search for a specific metric or click one of the category names, such as "Dynamic Apps" or "Collection Labels", to view a list of available metrics for that metric category.

  5. Click the name of the metric on which you want to enable machine learning for the device.

  6. For some metrics, a second drop-down field might display that enables you to specify the device directory. If this field appears, click the name of the directory on which you want to enable machine learning.

  7. Click Enable. The metric appears on the Machine Learning tab.

To disable machine learning for a metric, click the Actions icon () for that metric and select Disable. The metric is removed from the Machine Learning tab.

Enabling Machine Learning in the Service Investigator

The Anomalies widget in the Service Investigator displays a list of devices within the selected business, IT, or device service that have anomaly detection enabled. From this widget, you can also enable machine learning for additional metrics or disable machine learning metrics on which it is currently enabled.

The Anomalies widget appears only if you have at least one device in the selected service that has anomaly detection enabled.

To enable machine learning in the Service Investigator:

  1. On the Business Services page (), select a service from the list of business, IT, and device services by clicking its name. The Service Investigator displays.
  2. On the Service Investigator page, click the Anomalies widget.
  3. Click the Actions icon () for any of the listed metrics and select Enable. The Select Metric to Enable Machine Learning modal page appears.
  4. In the Select Metric drop-down, use the Search field to search for a specific metric or click one of the category names, such as "Dynamic Apps" or "Collection Labels", to view a list of available metrics for that metric category.
  5. Click the name of the metric on which you want to enable machine learning for the device.
  6. For some metrics, a second drop-down field might display that enables you to specify the device directory. If this field appears, click the name of the directory on which you want to enable machine learning.
  7. Click Enable Machine Learning. The metric appears in the Anomalies widget.

To disable machine learning for a metric, click the Actions icon () for that metric and select Disable. The metric is removed from the Anomalies widget.

Viewing Device Anomalies

On the Machine Learning tab of the Device Investigator, you can view a list of machine learning metrics that are enabled for the device:

The Machine Learning tab of the Device Investigator page

On this tab, you can view the Anomaly Detection graphs by clicking the Expand icon () next to the metric for the device. The Anomaly Chart modal appears, displaying the "Anomaly Index" chart above the chart for the specified metric you are monitoring.

The "Anomaly Index" chart displays a graph of values from 0 to 100 that represent how far the real data for a metric diverges from its normal patterns. The lines in the chart are color-coded by the level of event that gets triggered as the data diverges further.

You can view these graphs by clicking the Expand icon () next to the device or the metric for the device. The Anomaly Chart modal appears, displaying the "Anomaly Index" chart above the chart for the specified metric you are monitoring.

The "Anomaly Index" chart displays a graph of values from 0 to 100 that represent how far the real data for a metric diverges from its normal patterns. The lines in the chart are color-coded by the level of event that gets triggered as the data diverges further and further. You can define the thresholds for the Anomaly Index, and whether those values generate alerts, on the Machine Learning Thresholds page (Machine Learning > Thresholds). For more information, see Enabling Alerts and Thresholds for the Anomaly Index.

In the second graph, the blue shape represents the expected value range for the selected device metric over the given time period, the green line indicates the actual values that SL1 collected over that time period, and the small red dots at top left represent the anomalies where the actual value fell outside of the expected range.

You can hover over a value in one of the charts to see a pop-up box with the Expected Range and the metric value. The Anomaly Index value also displays in the pop-up box, with the severity in parentheses: Normal, Low, Medium, High, or Very High.

The second graph displays the following data:

  • A blue band representing the range of probable values that SL1 expected for the device metric.
  • A green line representing the actual value for the device metric.
  • A red dot indicating anomalies where the actual value appears outside of the expected value range.

You can zoom in on a shorter time frame by clicking and dragging your mouse over the part of the chart representing that time frame, and you can return to the original time span by clicking the Reset zoom button.

For more information about devices, see the section on Device Management.

Viewing Business Service Anomalies

If one or more devices within a business, IT, or device service has anomaly detection enabled, the Anomalies widget will appear on the Overview tab of the Service Investigator. The Anomalies widget displays a list of all the devices within the selected service that have anomaly detection enabled.

To view the Service Investigator page, select a service from the list of business, IT, and device services on the Business Services page (). The Overview tab opens by default. This tab provides a single-page view of the selected service, including key metrics, events, and anomalies that are impacting the service.

On the Anomalies tab of the Device Investigator, you can view a list of devices that are enabled for anomaly detection. Each device has a set of graphs that tracks the anomaly detection data for that device.

You can view these graphs by clicking the Expand icon () next to the device or the metric for the device. The Anomaly Chart modal appears, displaying the "Anomaly Index" chart above the chart for the specified metric you are monitoring.

The "Anomaly Index" chart displays a graph of values from 0 to 100 that represent how far the real data for a metric diverges from its normal patterns. The lines in the chart are color-coded by the level of event that gets triggered as the data diverges further and further. You can define the thresholds for the Anomaly Index, and whether those values generate alerts, on the Machine Learning Thresholds page (Machine Learning > Thresholds). For more information, see Enabling Alerts and Thresholds for the Anomaly Index.

In the second graph, the blue shape represents the expected value range for the selected device metric over the given time period, the green line indicates the actual values that SL1 collected over that time period, and the small red dots at top left represent the anomalies where the actual value fell outside of the expected range.

You can hover over a value in one of the charts to see a pop-up box with the Expected Range and the metric value. The Anomaly Index value also displays in the pop-up box, with the severity in parentheses: Normal, Low, Medium, High, or Very High.

The second graph displays the following data:

  • A blue band representing the range of probable values that SL1 expected for the device metric.
  • A green line representing the actual value for the device metric.
  • A red dot indicating anomalies where the actual value appears outside of the expected value range.

You can use the time span filter on the Anomalies widget to adjust the time span of anomalies that appears in the graph. The default filter is Last 24 hours, but you can select a time span ranging from Last Hour up to Last 2 Years. You can also zoom in on a shorter time frame by clicking and dragging your mouse over the part of the chart representing that time frame, and you can return to the original time span by clicking the Reset zoom button.

For more information about business services, see the section on Monitoring Business Services.

Enabling Alerts and Thresholds for the Anomaly Index

You can define the thresholds for the "Anomaly Index" chart, and whether those values generate alerts, on the Machine Learning Thresholds page (Machine Learning > Thresholds).

You can define which value in the Anomaly Index will trigger an alert, and the severity level of the alert. These settings are used by all devices that have enabled anomaly detection.

You can view these alert levels when you hover over a value in one of the charts on the Anomaly Chart modal. The Anomaly Index severity level displays after the index value, in parentheses: Normal, Low, Medium, High, or Very High:

An Anomaly Index severity level of Normal is assigned to a value in the chart that is lower than the lowest enabled alert level. In the example above, the threshold for the Low severity is enabled and set to 20 or higher, so the Anomaly Index value for that specific point in time has a severity level of Normal.

To edit the Anomaly Index thresholds:

  1. On the Machine Learning Thresholds page (Machine Learning > Thresholds), click Edit.

  2. For each of the four severity levels, from Low to Very High, you can select Enabled to have SL1 generate an alert when the Anomaly Index value for a metric is equal to or greater than the threshold for that severity level.

  3. You can edit the threshold value for each level if SL1 is generating too many (or not enough) anomalies of a certain severity level.

    For example, if you want to enable a Low level alert when the Anomaly Index value is between 25 and 39 for the SL1 system in the image above, you would go to the Low panel, select Enabled, and update the value from "20" to "25".

  4. Click Save.

  5. You can then edit an event policy that uses alerts based on the settings on this page to generate events in SL1. For more information, see Creating an Event Policy for Anomalies.