Alerts and Thresholds

Download this manual as a PDF file

This section describes what an alert and threshold is and how to use them in a Dynamic Application.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

What is an Alert?

An alert defines a formula that SL1 will evaluate each time data is collected. Each formula includes collection objects. When SL1 evaluates the alert formula, SL1 will substitute in the most recently collected values for each collection object. If the formula evaluates to true, SL1 generates an alert. When SL1 generates an alert, SL1 makes an entry in the device log. Additionally, you can define an event policy that will trigger an event if a specific alert is generated.

What is a Threshold?

A threshold defines a variable that can be used in one or more alerts. Each threshold defines a range of values for the variable and a default value. Users can change the value of the threshold on a per-device basis without having to modify the Dynamic Application.

Viewing the Thresholds in a Dynamic Application

To view the thresholds in a Dynamic Application:

  • Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  • Find the Dynamic Application you want to view the thresholds for. Select its wrench icon (). The Dynamic Applications Properties Editor page is displayed.
  • Select the Thresholds tab. The Dynamic Applications Threshold Objects page is displayed. The Thresholds Object Registry pane at the bottom of the page displays the following information about each threshold:
    • Name. The name of the threshold. This name will be used to label the threshold in the Thresholds tab in the Device Administration panel.
    • Override. Defines whether this threshold can be overridden on a per-device basis. Possible values are:
      • Enabled. When the Dynamic Application that contains this threshold is aligned with a device, a user will be able to set a new value for this threshold in the Thresholds tab in the Device Administration panel.
      • Disabled. Users cannot set a new value for this threshold on a per-device basis.
    • Type. Specifies whether the threshold will be a Percentage, Integer, or Decimal.
    • Numeric Range High. When a user overrides this threshold for a device, this is the maximum value the threshold can be set to.
    • Numeric Range Low. When a user overrides this threshold for a device, this is the minimum value the threshold can be set to.
    • Threshold Unit. Specifies the unit of measure for the threshold value.
    • Threshold Value. The default value for the threshold. If a user does not define an override for a device, SL1 will use this threshold value for that device.
    • ID. The unique ID assigned to the threshold by SL1. This ID is used to reference the threshold in alert formulas. The unique ID will always start with "t_".
    • Date Edit. The last time a user edited this threshold.

Creating a Threshold

To add a threshold to a Dynamic Application, perform the following steps:

  • Go to the Dynamic Applications Manager page ().
  • Find the Dynamic Application you want to add a threshold to. Select its wrench icon (). The Dynamic Applications Properties Editor page is displayed.
  • Select the Thresholds tab. The Dynamic Applications Threshold Objects page is displayed.
  • Supply values in the following fields:
    • Threshold Name. The name of the threshold. This name will be used to label the threshold in the Thresholds tab in the Device Administration panel.
    • Override Threshold Value. Defines whether this threshold can be overridden on a per-device basis. When disabled, this threshold does not appear on the Device Thresholds page for each subscriber device and cannot be edited for each subscriber device. Choices are:
      • Enabled. Threshold appears on the Device Thresholds page for each subscriber device and can be edited for each subscriber device.
      • Disabled. Threshold does not appear on the Device Thresholds page for each subscriber device.
    • Numeric Range High. Specifies the high end of the possible values. In the Device Thresholds page for each subscriber device, this number will appear at the high end of the slider. When a user overrides this threshold for a device (in the Device Thresholds page), this is the maximum value the threshold can be set to.
    • Numeric Range Low. Specifies the low end of the possible values. In the Device Thresholds page for each subscriber device, this number will appear at the low end of the slider. When a user overrides this threshold for a device (in the Device Thresholds page), this is the minimum value the threshold can be set to.
    • Threshold Type. Specifies whether the threshold will be a Percentage, Integer, or Decimal.
    • Threshold Unit. Specifies the unit of measure for the threshold value.
    • Threshold Value. Assigns a default value to the threshold object. If a user does not override the threshold in the Device Thresholds page for a subscriber device, SL1 will use this threshold value for the subscriber device.
  • Select the Save button to save the threshold.

Editing a Threshold

To edit an already-defined threshold:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  2. In the Dynamic Applications Manager page, find the Dynamic Application for which you want to edit a threshold. Select its wrench icon ().
  3. Select the Thresholds tab for the Dynamic Application.
  4. In the Threshold Objects page, find the threshold in the Threshold Object Registry pane. Select its wrench icon ().
  5. The fields in the top pane are populated with values from the selected threshold. You can edit the value of one or more fields. For a description of each field, see the Creating a Threshold section.
  6. Select the Save button to save your changes to the threshold.

Deleting a Threshold

You can delete a threshold from a Dynamic Application. To do this:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  2. In the Dynamic Applications Manager page, find the Dynamic Application for which you want to delete a threshold. Select its wrench icon ().
  3. Select the Thresholds tab for the Dynamic Application.
  4. In the Threshold Objects page, find the threshold in the Threshold Object Registry pane. Select its bomb icon (). The threshold will be deleted from SL1. You must manually remove the threshold from all alert formulas in which that threshold appears.

Viewing the Alerts in a Dynamic Application

To view the alerts in a Dynamic Application:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  2. Find the Dynamic Application you want to view the alerts for. Select its wrench icon (). The Dynamic Applications Properties Editor page is displayed.

  1. Select the Alerts tab. The Dynamic Applications Alert Objects page is displayed. The Alert Object Registry pane at the bottom of the page displays the following information about each alert:

  • Policy Name. The name of the alert.

  • Formula. The formula that SL1 will evaluate to determine whether to generate the alert. An alert formula can contain references to collection objects ("o_" followed by a numeric value), threshold objects ("t_" followed by a numeric value), and/or other alerts ("a_", followed by a numeric value).

  • State. Specifies whether SL1 will evaluate the alert when new data is collected for the Dynamic Application. Possible values are:
  • Enabled. SL1 will evaluate the alert when new data is collected.

  • Disabled. SL1 will not evaluate the alert when new data is collected.

  • Maintain. Specifies whether SL1 will track the status of this alert (active or inactive) for use as a condition in other alert formulas. Possible values are:
  • Yes. SL1 tracks the status of this alert. The status of this alert can be used as a condition in other alert formulas.

  • No. SL1 does not track the status of this alert. The status of this alert cannot be used as a condition in other alert formulas.

  • Events. Specifies whether an event policy has been created for this alert. Possible values are:
  • Yes. An event policy has been created for this alert. Selecting the event icon () will open the Event Policy Editor, where you can edit the associated event.

  • No. An event policy has not been created for this alert. Selecting the gray event icon () will open the Event Policy Editor, where you can define an event policy for this alert.
  • ID. The unique ID assigned to the alert by SL1. This ID is used to reference the alert in other alert formulas. The unique ID will always start with "a_".
  • Date Edit. The last time a user edited this alert.

Creating an Alert

To add an alert to a Dynamic Application, perform the following steps:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  2. Find the Dynamic Application you want to add an alert to. Select its wrench icon (). The Dynamic Applications Properties Editor page is displayed.
  3. Select the Alerts tab. The Dynamic Applications Alert Objects page is displayed.
  4. Supply values in the following fields:

  • Policy Name. Enter a name for the alert.
  • Active State. Specifies whether SL1 will evaluate the alert when new data is collected for the Dynamic Application. Possible values are:
  • Enabled. SL1 will evaluate the alert when new data is collected.
  • Disabled. SL1 will not evaluate the alert when new data is collected.

  • Log Message. Enter the log message associated with the alert. When SL1 generates this alert, this message will appear in the device logs. You can use the following substitution characters in this field. The substitution characters are populated by including certain functions in the Formula Editor for the alert:
  • %V. The value returned by the result() function.
  • %L. The value returned by the "label" argument in the result() function.
  • %T. The value returned by the threshold() function.
  • Maintain State. Specifies whether SL1 will track the status of this alert (active or inactive) for use as a condition in other alert formulas. Possible values are:
  • Yes. SL1 tracks the status of this alert. Select this option if you want to use the status of this alert as a condition in other alert formulas.
  • No. SL1 does not track the status of this alert. Select this option if you do not want to use the status of this alert as a condition in other alert formulas.
  • Trigger Alert. This is a deprecated field. Leave this field set to the default value.
  • Formula Editor. Enter the formula that SL1 will evaluate to determine whether to generate the alert. For a full description of this field, see the Alert Formulas section.
  1. Select the Save button to save the alert.

Alert Formulas

The Formula Editor field allows you to define the formula that SL1 will evaluate for an alert. The result of a defined alert formula must be either True or False. If the formula evaluates to True, SL1 generates an alert.

An alert formula can include:

  • The ID for one or more collection objects in the same Dynamic Application. All collection object IDs start with "o_" (lowercase "oh", underscore). However, do not use a collection object of type Label: Hourly Polled in an alert formula.
  • The ID for one or more thresholds in the same Dynamic Application. All threshold IDs start with "t_".
  • Arithmetic operators (+, -, *, /), comparison operators (>, <, !=, ==), and/or boolean operators ("and", "or", "not").
  • The "==" operator is used to test equality. The single "=" assignment operator is not supported, except when used inside some functions).

  • The result() and/or threshold() functions. The results of these functions can be included in the log message for the alert.
  • One or more ScienceLogic specific functions described in this section.
  • One or more ScienceLogic specific date & time variables or the tab_idx variable described in this section.
  • Any standard Python function.

The operators, functions, and variables may be used in any combination and with any number of collection object IDs and threshold IDs, provided that the result of the expression always equates to True or False.

The scrolling list below the Formula Editor contains a list of all collection objects and thresholds in the Dynamic Application. To insert the ID for a collection object or threshold in to the formula, select the collection object or threshold from the list, then select the Add button.

Evaluate

To evaluate the value of a collection object, enter an expression in the Formula Editor that compares the value of the object to something else, where "something else" can be another collection object, a threshold object, a simple number, a string, the result of a function, or a combination of objects and numbers.

For example, suppose you want to generate an alert based on the value of collection object "o_15160", which contains the current temperature of a hardware component in a Dell server. You could select object "o_15160: Temp. Reading" from the scrolling list. When the object appears in the Formula Editor, you could enter the following formula:

o_15160 > 50

The formula evaluates to True if the value returned by object o_15160 is greater than 50.

NOTE: SL1 assigns each object an object ID. This is different than the object number assigned to the object in the MIB. SL1 requires that you use the object ID when referring to an object.

If one or more of the collection objects used in an alert formula collect a list of values, SL1 evaluates the alert formula for each entry in the list. An alert can be generated for each entry in the list. For example, suppose the following alert formula is evaluated:

o_123 / o_124 * 100 > 90

Suppose that for each poll period for a device, the collection objects (o_123 and o_124) return a list of two values. SL1 will evaluate the alert formula twice for this device. For the first evaluation, the first value from the list of values collected for o_123 at that poll period and the first value from the list of values collected for o_124 at that poll period will be substituted into the alert formula. For the second evaluation, the second values in the list of collected values will be substituted in to the alert formula.

If you compare a collection object to a string or an integer, be careful to use quotation marks consistently. The collection object and the comparison string or comparison integer must use the same quotation marks. Either both the collection object and the comparison value must be surrounded by single quotation marks or neither the collection object nor the comparison value should be surrounded by single quotation marks. For example:

  • o_1346 == '5' is incorrect syntax (notice no single quotation marks around o_1346 but single quotation marks around 5)
  • 'o_1346' == '5' is correct
  • o_1346 == 5 is correct

For more details, see the section on Using Quotes in an Alert.

Operators

The Formula Editor accepts only arithmetic operators (+, -, *, /), comparison operators (>, <. !=, ==) and Boolean operators (and, or, not). Parentheses are used to group and set precedence for operators.

The comparison operators are those commonly used in many modern programming languages. It is important to note that the '==' operator is used to test equality.

The single '=' assignment operator is not supported (except when passing optional arguments to the result() function, see below).

The operators may be used in any combination and with any number of object IDs, provided that the result of the expression always equates to TRUE or FALSE.

The following examples are all valid formulas:

o_123 / 100 > 3

o_123 > 3 and o_123 < 7

o_123 == 5

o_123 == 2 or o_123 == 4

Valid examples using multiple objects:

o_123 == o_124

o_123 / o_124 > 2.5

(o_123 + o_124) > (o_125 + 6)

Dividing Integers

When you divide two integers, sometimes the result is a number that includes a decimal point. For example, if you divide 3/2, the result is 1.5. A value with a decimal point is called a floating-point number.

In SL1, only if the divisor is a float can the resulting quotient be a float. If an integer is divided by another integer, the result is an integer. For example, if "3/2" is evaluated in an alert formula, the result will be 1, not 1.5. If you are dividing two numbers in SL1, you can use the python float() function to ensure that the divisor is converted to a floating-point number. This will ensure that the resulting quotient is also a floating-point number

For example, suppose your Dynamic Application includes two objects, o_123 and o_456. Suppose you want to divide the value of o_123 by the value of o_456. You should use the float() function like this:

o_123/float(o_456)

Because the divisor is a floating-point number, the resulting quotient can be a floating-point number.

Using Quotes in an Alert

Some alert functions accept either a string or a number as an argument. Some alert functions also require quotes around the arguments. When using quotes with alert functions, follow these guidelines:

  • If you want to perform string operations on the object (such as find()), put the collection object ID in quotes.
  • If you want to perform numeric operations on the object, do not put the collection object ID in quotes.
  • You can use single quotes or double quotes – either work, but they must be paired (you cannot start with ' and end with ").
  • For consistency, ScienceLogic recommends that you always use single quotes around strings in formulas.
  • Take care if you are copying and pasting text from a text editor – the Python interpreter that evaluates alert formulas requires specific quote characters. These are valid quote characters: ' ", these are not: ‘ ’ “ ”.

The result() Function

The result() function passes a value for use in the alert message or event message associated with the alert. The passed value is available in the %V substitution. The expression defined in the result() function can be any combination of object IDs, numeric values, and strings allowed in the normal alert formula logic. The syntax is:

result(expression)

For example, if temperature is provided via object ID o_126, the following formula would return the temperature value for use in the alert message or the event message associated with the alert:

result(o_126) > 35

If the temperature is returned in Celsius, and you want it displayed in the alert message or event message in Fahrenheit, you could use the following formula:

result((o_126 * 1.8) + 32) > 120

The expression passed to the result() function can also refer to multiple objects. For example, suppose you wanted to determine the total number of IP DHCP addresses in a subnet. You could add together the following example objects:

  • o_5678 = number of addresses in use.
  • o_8910 = number of addresses free.
  • o_1112 = number of addresses pending.

The result() function would look like this:

result(o_5678 + o_8910 + o_1112)

There are two optional arguments to the result function: enums and label. These provide for further descriptive text to be substituted into the alert message or event message associated with the alert. The syntax is:

result((expression), enums={number:'value',…}, label='label object')

The enums argument specifies a list of substitution characters that SL1 should use to translate the value passed by the result() function. In general, this argument is used only if the expression refers to a collection object that is of type Config Enum. This argument must be contained within curly braces {} and is formatted with the value and substitution characters separated by a colon, and each element separated by commas.

For example, suppose you want to monitor a status object, o_987. Suppose this status object is of type Config Enum and can return integer values 3, 4, 5, or 6. These integer values map to the following states:

  • 3 = OK
  • 4 = nonCritical
  • 5 = critical
  • 6 = nonRecoverable

If collection object o_987 is passed using the result() function, the integer value of o_987 is stored in the %V variable. By using the enums argument, you can pass the string value instead:

result(o_987, enums = {3:'OK',4:'nonCritical', 5:'critical', 6:'nonRecoverable'})

The label argument passes an additional value for use in the alert message or event message associated with the alert. The passed label value is available in the %L substitution variable.

For example, suppose you have multiple temperature probes and you want to capture which temperature probe is generating the alert. If collection object o_5678 contains the name of the temperature probe, you could supply the following value in the label argument (again using a conversion from Celsius to Fahrenheit):

result(((1.8 * o_1234)+32), label='o_5678')

 

The value of the label argument can now be used in an event definition using the %L substitution.

The value of the label argument must be in single quotes.

The threshold() Function

Like the result() function, the threshold() function passes a value for use in the alert message or event message associated with the alert. The passed value is available in the %T substitution variable. The syntax is:

threshold(expression)

The threshold() function is intended to make the value of a threshold that triggered an alert visible in the corresponding log message and/or event.

As with the result() function, the threshold() function can contain a combination of object IDs and numeric values.

For example, suppose you are monitoring temperature. The object o_1234 contains temperature in Celsius. The threshold t_99 contains the maximum allowable temperature. You could enter the following formula to convert the temperature values to Fahrenheit and generate an alert when the temperature reading exceeded the allowable maximum:

result((1.8 * o_1234)+32) > threshold((1.8 * o_t_99)+32)

If an alert used the result() function with the label argument and the threshold() function, the alert message might look like this:

%L reporting temperature of %V F, threshold is %T F

The values substituted in to replace %L, %V and %T are the actual values captured in the alert formula definition by the result() and threshold() functions. If the alert is generated, the alert message might look like this:

Backplane-probe reporting temperature of 130 F, threshold is 120 F

The items in bold are the substituted values.

The active() Function

The active() function checks the state of another alert in the same Dynamic Application. If the alert is active, the active() function returns the value TRUE. The active() function can be used in combination with other expressions. When used in combination with other expressions, if the entire expression resolves to TRUE, a new alert is generated. The syntax is:

active(alert_ID)

To use the active() formula to check the state of an alert, the alert being checked must have the Maintain State set to Yes.

For example, suppose you are monitoring a circuit for failed authentications. You could define two alerts:

The first alert is assigned ID a_175 by SL1. In the first alert, the object o_16060 contains the number of failed authentications. The object o_16056 contains the name of the circuit being monitored for failed authentication.

result(o_16060, label='o_16056') > threshold(13)

In this expression, a high-end threshold is defined. In this expression, if more than 13 failed authentications occur on the circuit specified in o_16056, an alert is triggered. This alert defines the problem. For this alert, the Maintain State drop-down list is set to Yes.

The second alert is assigned ID a_176 by SL1. In the second alert, we define a low-end threshold for failed authentications.

result(o_16060, label='o_16056') < threshold(7) and active(a_175)

In this expression, if less than seven failed authentications occur on the circuit specified in o_16056, AND alert a_175 is still active, a new alert is triggered. The second alert defines the "healthy" criteria.

Suppose we had defined a critical event policy based on alert a_175. This event would warn that too many failed authentications occurred on circuit o_16056.

Now suppose we wanted to clear that critical event when failed authentications fall back to an acceptable number. We could create a healthy event policy based on alert a_176. We could define this event to auto-clear the previous critical event. The new event would inform you that failed authentications were back within acceptable levels.

The avg() Function

The avg() function calculates and returns the average of all values returned by a collection object when that collection object returns a list of values. When a collection object returns a single value, the avg() function will return that single value. The syntax is:

avg(object_ID)

For example, suppose a device has three network interfaces. Now suppose the object o_1234 contains the value for "octets in".

On our example device, we would have three separate instances of object o_1234. We could use the avg() function to calculate and return the average of ALL instances of object o_1234. If that average is greater than 1,000,000 octets, an alert is triggered:

avg(o_1234) > 1000000

The deviation() Function

The deviation() function allows you to define alerts that are triggered when an object has a value that is outside the "normal" range for the current hour on the current day of the week. The syntax is:

deviation(object_ID)

The deviation() function allows you to compare the current value of a collection object to a specified number of standard deviations outside the "normal" value for the object. You can specify that you want to trigger an alert if the current value falls outside the specified number of standard deviations from "normal".

To use the deviation() function, you must configure SL1 to store and calculate the mean values and standard deviations for a collection object. You do this by selecting the Enable Deviation Alerting field on the Collection Objects page. You then specify the minimum and maximum number of weeks to collect deviation data for the object.

SL1 must have already collected at least the minimum number of weeks' worth of data for an object before SL1 will evaluate alerts that use the deviation() function.

To use the deviation() function, you must specify a minimum value of at least two weeks. This value is not configured by default, and in some cases a PowerPack upgrade can overwrite this value. You are responsible for making sure that the min/max week settings are properly configured on the Collection Objects page with your preferred defaults.

Make sure that the raw data retention settings for your Dynamic Application performance data being is correct, as the deviation() function is dependent on this data. The system default in most cases is 7 days, and the deviation alerting feature will require you to either increase this at the system level or go in manually to each Dynamic Application for a device and set the window to something large. For example, 30 days is ideal in cases with 4 weeks in the max window settings. ScienceLogic recommends that you do not set your maximum week value too high' in most cases, a minimum of 2 weeks and maximum of 4 weeks will be sufficient.

SL1 evaluates the deviation() function against the aggregated values collected for multiple weeks. The number of weeks used during the evaluation depends on the amount of data available and is at least the minimum number of weeks specified for the collection object.

After SL1 has collected the minimum number of weeks' worth of data for an object, SL1 calculates the mean value for that object at every hour of every day and then calculates the standard deviations from that mean value.

The deviation function uses the following formula to examine the value of an object:

(current value of a collection object -mean value for current hour on current day of week) / (standard deviation for current hour at current day of week)

The deviation() function converts the value of (current value of a collection object -mean value for current hour on current day of week) to a positive value. Therefore, you should always compare the results of the deviation function to zero ("0") or to a positive number.

The actual substitution will come out as "False and 0" in cases where:

  • There is not enough data, such as when you only have 7 days of data and you need 2 weeks (14 days).
  • Nothing deviated.

"False and 0" might cause some confusion, but it was a strategic way to get an entire deviation formula to return false."

The deviation() function sends data to the collector in the form of upserts to calculate the standard deviation. In cases where a collector failover might happen, or if a device moves collector groups, the standard deviation data is re-upserted on the new collector.

Also, when enabling deviation alerting on a collector, if you set the raw data retention threshold and the standard deviation minimum weeks alerting window, any time that the deviation alerting window is higher than the raw data retention threshold, deviation alerting will not be available after a collector failover. Deviation alerting will be available only when SL1 has gathered enough new raw data to meet the minimum weeks specified by standard deviation.

Some possible uses for the deviation function are:

  • Determining if an application is functioning properly. For example, if a log file for an application begins to grow at a rate outside the "normal" range, you can trigger an alert to determine if there is a problem with the application.
  • Monitoring security. For example, if bandwidth usage exceeds the normal activity, you can trigger an alert that indicates that your network might have been compromised.

Example 1

For example:

  • Suppose SL1 has already collected four weeks' worth of data for an object (o_123).
  • Suppose that for the last four weeks, every Monday between 09:00 and 10:00, the mean value for o_123 is "50".
  • Suppose that 68% of all values on Monday between 09:00 and 10:00 fall within +/- "10" of the mean value (that is, between 40 and 60).
  • For o_123 on Mondays between 09:00 and 10:00, a standard deviation of "1" is +/- "10".

Suppose we specify the following alert formula:

deviation(o_123) > 1

This alert formula specifies that if o_123 collected a value outside "1" standard deviation of the "normal" value, SL1 should trigger an alert.

If on Monday at 09:30, the value of o_123 is "45", the deviation function performs the following calculation:

(45-50)/10 > 1

This evaluates to FALSE, therefore SL1 would not trigger an alert.

Example 2

For another example:

  • Suppose SL1 has already collected four weeks' worth of data for a collection object (o_789).
  • Suppose that on Monday, between 09:00 and 10:00, the value of o_789 is always "50" and has not varied at all during the entire four weeks.
  • The mean value of o_789 on Mondays, between 09:00 and 10:00 is "50".
  • 68% of all values on Mondays between 09:00 and 10:00 will fall within +/- "0" of the mean.
  • For o_789 on Monday between 09:00 and 10:00, a standard deviation of "1" is +/- "0".

Suppose on Monday at 09:15, the value of o_789 is "50". If this is the expected behavior, we could specify:

deviation(o_789) >1

The deviation function performs the following calculation:

(50-50)/0 > 1

This evaluates to FALSE, therefore SL1 would not trigger an alert.

Now suppose that on Monday at 09:30, the value of o_789 is "45", the deviation function performs the following calculation:

(45-50)/0 > 1

In this case, the deviation function would return "infinity". The formula will evaluate to TRUE, and trigger an alert. If we did not expect to repeatedly collect the same value every Monday between 09:00 and 10:00, this formula might be appropriate to monitor object o_789.

In the alert editor, you can specify infinity as float("inf")

If you do expect the first standard deviation to be "+/- 0", that is, if you expect the value of an object to never fluctuate, you can test for this with the following:

deviation(o_789) == float("inf")

Because (45-50)/0 does equal infinity, this will evaluate to TRUE and trigger an alert. We can define this alert to provide details about a standard deviation of “0”.

The 68-95-99.7 rule

Statistically, 99.7% of all values lie within three standard deviations from the mean.

In theory, if an alert formula uses a standard deviation of "1" as the threshold, such as the one in Example 1, the alert will trigger for 32% of the collected values.

Using the 68-95-99.7 rule, the following list indicates the percentage of collected values that will trigger an alert for different standard deviation thresholds:

  • 1 (one) standard deviation. 32% of collected values will fall outside this range. Therefore, an alert that compares an object's value to a value of :1 (one) standard deviation will be triggered 32% of the time.
  • For example:

    deviation(o_123)> 1

    will trigger an alert on 32% of the values of o_123.

  • 2 (two) standard deviations. 5% of collected values will fall outside this range.
  • 3 (three) standard deviations. 0.3% of collected values will fall outside this range.

The find() Function

The find() function searches a collection object for a specific alpha-numeric string. The syntax is:

object_ID.find('alpha-numeric string')

If the collection object contains the sting, the find() function returns the location of the string in the collection object, where 0 is the first character of the collection object. If the collection object does not contain the string, the find() function returns -1 (negative one).

For example, suppose you want to trigger an alert if inbound bandwidth-usage drops below a certain level. Suppose collection object o_4567 collects "inbound octets". Suppose collection object o_4568 returns the type of interface. Now suppose you don't want to trigger an alert if this condition occurs on a loopback interface. You could write an alert like this:

result(o_4567) < 500 and o_4568.find('loopback') = -1

This logic says "if object o_4557 is less than 500 and object o_4558 does not contain the text "loopback" (that is, the find function cannot find the string "loopback", so returns "-1"), trigger an alert.

The global() Function

The global( ) function returns the collected value (or, for list objects that are not collected at each collection, the last collected value) of a collection object regardless of the group or index that is currently being evaluated for the alert formula.

The syntax is:

global(object_ID)

When you include multiple collection objects in an alert, those collection objects must be in the same group. If a list or table of values is returned by a collection object, the platform evaluates the alert for each index in that group.

You can use the global( ) function to include a collection object that returns a single value in an alert formula that also includes collection objects that return lists.

For example:

Suppose that you have a group, Group 1, that contains the following two collection objects:

  • o_123. Contains file system utilization in percent.
  • o_124. Contains name of file system.

Suppose that you have an additional collection object that is not aligned with any group:

  • o_125. Contains a single value - the file system that is used for data storage for the primary application.

Suppose you want to trigger an event when a file system is over 80% utilized:

  • Suppose that you want to trigger a Major event for all file systems except the one used for data storage for the primary application.
  • Suppose you want to trigger a Critical event for the file system used for data storage for the primary application.

To do this, you can create two alerts:

  • An alert for the file system for data storage for the primary application
  • An alert for all other file systems.

In the alert, the object o_125 will be compared to o_124 to determine whether this is the file system for data storage for the primary application.

Because o_125 is a single value in a different group/index than the other collection objects, we use the global() function to return the value for o_125.

The two alerts would look like this:

o_123 > 80 and 'o_124'.find('global(o_125)') > -1

o_123 > 80 and 'o_124'.find('global(o_125)') = -1

The first alert will trigger only for the file system for data storage (file system name is the same name as contained in o_125).

The second alert will trigger for all other file systems (file system name is not the same as in o_125).

NOTE: If the collection object specified in the global() function is a string, the SL1 system will automatically substitute the value surrounded by single quote characters.

The log() Function

The log() function calculates and returns the logarithm of a specified number to the specified base. If base is not specified, the log function uses base e to return the natural logarithm. The syntax is:

log(number, base)

For example:

log(o_123, 10)

This example would calculate the base-10 logarithm of the value of collection object o_123.

The prior() Function

The prior() function returns the previous value of a collection object (from the previous polling session). The syntax is:

prior(object_ID)

For example:

o_123 > prior(o_123)

This example evaluates to true if the value of object o_123 is now greater than it was during the last polling session.

Suppose you have defined a polling frequency of five minutes for a Dynamic Application. Every five minutes, SL1 will retrieve the value from object o_123. The example above could be used to trigger an alert if the value of object o_123 is now greater than it was five minutes ago.

The round() Function

The round() function rounds the value of a collection object to a specified number of digits. The round() function returns a floating point value.

The syntax is

round(object ID, number_of_digits)

 

  • The round() function rounds values to the closest multiple of 10 to the negative n.
  • round (0.5) returns 1.0
  • round (-0.5) returns -1.0
  • If you omit the optional number_of_digits argument, the function defaults to a whole number and ".0:, like "2.0"

For example, suppose object o_567 contains the value "4.688".

round(o_567, 2)

would return the value

4.67

The sum() Function

The sum() function calculates and returns the sum of all values returned by a collection object when that collection object returns a list of values. When a collection object returns a single value, the sum() function will return that single value. The syntax is:

sum(object_ID)

For example, suppose a device has three network interfaces. Now suppose the object o_1234 contains the value for "octets in".

On our example device, we would have three separate instances of object o_1234. We could use the sum() function to calculate and return the sum of ALL instances of object o_1234. If that sum is greater than 1,000,000 octets, an alert is triggered:

sum(o_1234) > 1000000

Date & Time Variables

You can use the time and date variables to define alerts that occur only during specified dates or times.

Values starting with the word “local” refer to the local date and time, specified in System Timezone field in the Behavior Settings page (System > Settings Behavior).

For every “local” value there is a “utc” equivalent that uses values based on UTC.

The following is a list of time and date variables that can be used in alerts:

  • localyear. The year, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be any four-digit year. For example, 2007.
  • localmonth. The month, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 01 – 12, with 01 being January and 12 being December.
  • localmonthday. The day of the month, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Value can be 01 – 31.
  • localhour. The hour, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 0 – 23, with 0 being midnight and 23 being 11PM.
  • localminute. The minutes, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 0 -59, with 0 being 0 minutes after the hour and 59 being 59 minutes after the hour.
  • localsecond. The seconds, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 0 – 61. 60 is used for leap seconds; 61 is used for double leap-seconds.
  • localweekday. Day of the week, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 0 – 6, with 0 being Monday and 6 being Sunday.
  • localyearday. Day of the year, in local date and time, as specified in the System Timezone field in the Behavior Settings page. Values can be 1 – 366.
  • utcyear. The year, in UTC date and time. Values can be any four-digit year. For example, 2007.
  • utcmonth. The month, in UTC date and time. Values can be 01 – 12, with 01 being January and 12 being December.
  • utcmonthday. The day of the month, in UTC date and time. Value can be 01 – 31.
  • utchour. The hour, in UTC date and time. Values can be 0 – 23, with 0 being midnight and 23 being 11 PM.
  • utcminute. The minutes, in UTC date and time. Values can be 0 -59, with 0 being 0 minutes after the hour and 59 being 59 minutes after the hour.
  • utcsecond. The seconds, in UTC date and time. Values can be 0 – 61. 60 is used for leap seconds; 61 is used for double leap-seconds.
  • utcweekday. Day of the week, in UTC date and time. Values can be 0 – 6, with 0 being Monday and 6 being Sunday.
  • utcyearday. Day of the year, in UTC date and time. Values can be 1 – 366.

Example 1

For example, you could define an alert that is triggered only when both the following are TRUE:

  • the value of o_999 falls to zero
  • the condition occurs during business hours (between 8AM and 6 PM)

To define the alert, you could use the localhour variable:

result(o_999) == 0 and localhour >= 8 and localhour <= 18

 

Example 2

You could also define an alert that excludes weekend days. The alert would be triggered only when both the following are TRUE:

  • the value of o_999 falls to zero
  • the condition occurs during weekdays

To define the alert, you could use the localweekday variable. The days of the week are numbered from 0 to 6, with Saturday being 5 and Sunday being 6. To exclude weekends you could define the alert to trigger only when the “localweekday” number is less than 5

result(o_999) == 0 and localweekday < 5

 

Example 3

You could also define an alert that is triggered only on the first day of the month. The alert would be triggered only when both the following are TRUE:

  • the value of o_999 falls to zero
  • the condition occurs when the day of the month is “1”

To define the alert, you could use the localmonthday variable:

result(o_999) == 0 and localmonthday == 1

Example 3

You could also define an alert that is triggered only on the first day of the month. The alert would be triggered only when both the following are TRUE:

  • the value of o_999 falls to zero
  • the condition occurs when the day of the month is "1"

To define the alert, you could use the utcmonthday variable:

result(o_999) == 0 and utcmonthday == 1

The tab_idx Variable

The tab_idx variable allows you to apply an alert only to specific indexes when a collection object returns a list of values. For SNMP Dynamic Applications, the tab_idx variable returns the SNMP index that corresponds to the collection object values that are currently being evaluated. For other Dynamic Applications, the tab_idx variable returns the location in the list of values that corresponds to the collection object values that are currently being evaluated.

For example:

  • Suppose that in a list of values representing device state, there may be six potential indexes in the list of values, but only 1, 3, 4, and 6 are used.
  • Suppose that the object o_999 represents the device state, with 0 being healthy.

You could define the alert to check when object o_999 does not equal zero (is not healthy). But you could limit the alert to check only indexes 1, 3, 4, and 6. For an SNMP Configuration or SNMP Performance Dynamic Application, the alert formula would be:

o_999 != 0 and tab_idx in ['.1','.3','.4','.6']

For all other types of Dynamic Applications, the alert formula would be:

o_999 != 0 and tab_idx in ['1','3','4','6']

Notice the syntax for the tab_idx variable. You must use the syntax as it appears in the example:

  • Include the text "in" after the tab_idx variable
  • Surround the list of index numbers with square brackets
  • Surround each index with single-quotation marks (')
  • For SNMP Configuration and SNMP Performance Dynamic Applications, precede each index with a period (.)
  • Separate the list of index numbers with commas (,)

Creating an Event Policy for an Alert

When SL1 generates an alert for a device, the log message associated with that alert is inserted into the device logs. Optionally, you can associate an alert with an event policy. When SL1 generates an alert that is associated with an event policy and all the conditions defined in the event policy are met, SL1 will generate an event. In addition to appearing in the device logs, the event will appear:

  • In the Event Console page (the Events tab).

  • In the Summary tab in the Device Reports panel.
  • In the Events tab in the Device Reports panel.

To create an event policy for an alert:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).

  1. Find the Dynamic Application you want to add an event policy for. Click its wrench icon (). The Dynamic Applications Properties Editor page is displayed.
  2. Click the Alerts tab. The Dynamic Applications Alert Objects page is displayed.
  3. In the Dynamic Applications Alert Objects page, find the alert in the Alert Object Registry pane. Click its gray event icon ().

  1. The Event Policy Editor page is displayed. The following fields in the Event Policy Editor are populated with information about the alert:

  • Event Source. This field is set to Dynamic, which tells SL1 that this event policy matches alerts generated using Dynamic Applications.

  • Policy Name. This field is populated with the value you specified in the Policy Name field for the alert.
  • Link-Alert. This field appears on the Advanced tab in the Event Policy Editor. This field is populated with a link to the alert, which tells SL1 that the alert you selected can trigger this event policy.

  1. Before you save the event policy, you must supply a value in the following fields:
  • Operational State. Specifies whether SL1 should trigger this event if the associated alert is generated and all the conditions defined in the event policy are met. Choices are:
  • Enabled. SL1 will trigger this event if the associated alert is generated and all the conditions defined in the event policy are met.
  • Disabled. SL1 will never trigger this event. Selecting this option does not stop SL1 from generating the associated alert.

  • Event Severity. The severity value to associate with events created with this event policy. Choices are:
  • Healthy

  • Notice
  • Minor
  • Major
  • Critical
  • Event Message. The event message that SL1 will associate with instances of this event. If you want to use the message generated by the alert that triggers this event, enter "%M" in this field.

  1. You can optionally specify values in the additional fields, including the Policy Description field and the fields that appear in the Advanced tab. For more information about the Event Policy Editor, see the Defining an Event Policy section. If you choose to make further changes to your event policy, you must not change the values in the following fields:
  • The Event Source field in the Policy tab.
  • The Link-Alert field in the Advanced tab.

  1. Click the Save button to save the event policy.

Editing an Alert

To edit an already-defined alert:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).

  1. In the Dynamic Applications Manager page, find the Dynamic Application for which you want to edit an alert. Click its wrench icon ().
  2. Click the Alerts tab for the Dynamic Application.
  3. In the Dynamic Applications Alert Objects page, find the alert in the Alert Object Registry pane. Click its wrench icon ().
  4. The fields in the top pane are populated with values from the selected alert. You can edit the value of one or more fields. For a description of each field, see the Creating an Alert section.
  5. Click the Save button to save your changes to the alert.

Deleting an Alert

You can delete an alert from a Dynamic Application. To do this:

  1. Go to the Dynamic Applications Manager page (System > Manage > Applications).

  1. In the Dynamic Applications Manager page, find the Dynamic Application for which you want to delete an alert. Click its wrench icon ().
  2. Click the Alerts tab for the Dynamic Application.
  3. In the Dynamic Applications Alert Objects page, find the alert in the Alert Object Registry pane. Click its bomb icon (). The alert will be deleted from SL1. If an event policy was associated with the alert, you must manually delete the event in the Event Policy Manager page (Registry > Events > Event Manager).

Validating an Alert

After you have created an alert formula, you can use a command line interface (CLI) tool to validate that the alert works as intended. This tool enables Dynamic Application developers to validate alerts by directly inserting data into the Alerting module without using a full SL1 System.

Running the Alert Validator Command Line Tool

The alert_validator.py Python CLI tool enables Dynamic Application developers to validate the alert formulas that are included in Dynamic Applications using shell sessions at the command line. This tool is located in the following folder:

/opt/em7/backend/alert_validator.py

The alert_validator.py tool can be run only from a Data Collector or All-In-One Appliance.

The alert_validator.py tool checks an alert formula to validate that the following are all true:

  • All collection object and device threshold object references are structured correctly and are valid for the specified Dynamic Application.
  • Any collection objects that are referenced outside of global(), sum(), and avg() all belong to the same group of metrics.
  • The following functions are structured correctly and contain a valid collection object reference:
  • sum
  • avg
  • global
  • prior
  • changed
  • deviation
  • The active() function alert reference is structured correctly and is valid for the specified Dynamic Application, with an alert formula that belongs to the same group as the alert being tested.
  • The result() and threshold() functions are used correctly (including optional result arguments).
  • The alert formula contains no syntax errors.

To run the alert_validator.py tool, you must first run a command to create JSON files that capture data about the Dynamic Application. This command uses the following structure, and must always end with the device ID and Dynamic Application ID:

sudo -u s-em7-core python /opt/em7/backend/alert_validator.py --capture-data [Device ID Dynamic Application ID]

Replace [Device ID Dynamic Application ID] with the Device ID and Dynamic Application ID, respectively. Do not include the brackets. The Device ID and Dynamic Application ID should be separated by a space and no other characters.

When you run this command, a folder containing all of the generated JSON files is created in the /tmp/ folder. The folder name includes the Dynamic Application ID.

After the folder with the JSON files has been created, you can then run a separate command to evaluate the alert formulas. This command uses the following structure:

sudo -u s-em7-core python /opt/em7/backend/alert_validator.py -j AlertInternalState.json -j DynamicAppAlert_[Alert ID].json -j DynamicAppThreshold_[Threshold ID].json -j DynamicAppObjects_[Object ID].json -j DynamicAppObjectSchema_[Dynamic Application ID].json

Replace [Alert ID], [Threshold ID], [Object ID], and Dynamic Application ID with the Alert ID, Threshold ID, Object ID, and Dynamic Application ID, respectively. Do not include the brackets.

ScienceLogic recommends running the command from the folder in which the JSON files are stored so you do not need to include the path for each JSON file.

If the alert validation is successful, a detailed report will display indicating that the specified alerts were successfully triggered.

If the alert validation fails, a message will display that specifies the reason why it failed.

Example

The following example creates JSON files to capture data for device ID 3, Dynamic Application ID 1429:

sudo -u s-em7-core python /opt/em7/backend/alert_validator.py --capture-data 3 1429

 

After the JSON files have been created, you can then use them to evaluate the alert formulas. For example:

sudo -u s-em7-core python /opt/em7/backend/alert_validator.py -j AlertInternalState.json -j DynamicAppObjects_3.json -j DynamicAppThreshold_651.json -j DynamicAppAlert_2111.json -j DynamicAppObjectSchema_1429.json