Skylar Analytics: Predictive Alerting

The Predictive Alerting component of Skylar Analytics helps to avoid problems such as file systems running out of space, hosts running out of memory, or issues with network reliability due to oversubscription. The alerts are generated in advance of the problem and can provide days, weeks, or months of notice depending upon the conditions.

The Predictive Alerting component monitors file systems (SNMP, PowerShell, SSH), network interfaces (utilization, errors, discards), and memory.

What is Predictive Alerting?

Predictive alerts help to avoid problems such as file systems running out of space, hosts running out of memory, or issues with network reliability due to oversubscription. The alerts are generated in advance of the problem and can provide days, weeks, or months of notice depending upon the conditions.

A prediction cannot be made less than three times of the observation window. In other words, if you have one day of information, Skylar AI will not generate a prediction more than three days in the future.

How Predictive Alerting Works

To generate predictive alerts, Skylar AI looks at utilization trends over the past 30 days. In the case of file systems, Skylar AI looks at maximum value, and in the case of memory or network interfaces, Skylar AI looks at the daily p95 value: the 95th percentile value, where 95 percent of the data in the past 30 days is lower than the p95 value and five percent of the data is higher than this value. Skylar AI uses these values to compute a linear trend, which provides a very simple slope to predict when a threshold will b reached.

Additionally, Skylar AI looks at the 99th percentile of daily differences and the interquartile range (also known as IQR, which is the spread of the data based on the difference between the 75th and 25th percentiles of the data) of daily differences. If today's difference exceeds the value of <99th percentile> + 3*<IQR>, Skylar AI assumes there is a breakout and uses this new value to calculate a new slope to predict when a threshold will be reached.

Skylar AI also uses a number of other heuristics to prevent false positives, depending on the metric type. For example:

at 70%, increasing consistently by 1% per day; will predict 100% in 30 days.
at 50%, increasing consistently at 1% per day, but now increasing at 5% per day; will predict 100% in 10 days.

Additionally, for network interfaces, Skylar AI does not generate a predictive alert until at least one error or discard has been seen. The purpose of this is to weed out noise where no problems are likely to occur soon. In other words, Skylar AI is noticing some transients indicating network congestion; looking backward, does this seem to be the result of a recent trend of greater use?

Viewing Predictive Alerts in SL1

When your SL1 system is connected to Skylar AI, you can start viewing predictive alerts in SL1. No additional configuration is needed.

Predictive alerts display the Skylar icon () to the left of the event message in the Message column of the Events page, and the message starts with the word "Prediction":

To view details about a predictive alert:

In SL1, go to the Skylar AI page () and click the Visit button for Skylar Predictive Alerting. A filtered Events page displays a list of predictive alerts.
On the Events page, click the message for a predictive alert with the Skylar icon (). The Event Investigator page for that alert appears.
On the Event Investigator page, the Skylar Analytics Summary panel displays a timeline of data from Skylar AI about a specific metric:

The dotted line on the graph in the Skylar Analytics Summary panel represents a time frame in the future that Skylar AI is forecasting, based on pattern recognition.

The blue line represents the activity observed so far by SL1, and the gray dotted line represents the threshold set in SL1.The blue dotted line represents where Skylar AI is predicting a potential alert in the future, with the gray line representing a potential problem in the future, also predicted by Skylar AI.

In the example above, Skylar AI predicts that the file system utilization will hit the threshold of 100% in three days, on October 7th. By tracking the timeline on the graph, you can see when a potential event might happen, and you can take action now to prevent it.

In addition, if you have an event policy monitoring a metric that is now being tracked by Predictive Alerting, you can disable that event policy.

Because the data for the chart on the Skylar Analytics Summary panel is coming from Skylar AI, you will not be able to use that data in an SL1 dashboard. Also, this chart is rendered at prediction time and is static, so that when opening an event, you can see the state and prediction at the time of prediction.

You can also review the logs for a specific device to view the history of the predictions:

On the Devices page or the Events page, select the device with the predictive alerts. The Device Investigator page for that device appears.
Click the Logs tab. A list of recent logs displays:
If needed, type "prediction" in the Message column to view only the predictive alerts.

Using Predictive Alerts to Trigger Automated Run Book Actions

After Skylar AI creates an SL1 event for a predictive alert, you can create a run book automation policy that runs one or more run book actions when a predictive alert is generated.

The predictive alert must have an Event Type of Device and an Event Source of Skylar AI.

To use predictive alerts to trigger automated run book actions:

Go to the Automation Policy Manager page (Registry > Run Book > Automation).
Click the Create button. The Automation Policy Editor page appears:

In the Policy State field, select Enabled.
In the Available Events field, search for and select one or more event policies related to predictive alerts, and then click the right-arrow icon to move each event to the Aligned Events field.
In the Available Actions field, search for and select one or more run book actions that you want to run when the predictive alert event from step 4 occurs. Click the right-arrow icon to move each action to the Aligned Actions field. For example, you might want to send an email or create a ticket for that predictive alert.
Complete the remaining fields on the Automation Policy Editor page based on the specific parameters that you want to establish for the automation policy. For more information about the fields on the Automation Policy Editor page, see Automation Policies.
When you are finished, click Save.