Monitoring Device Availability and Latency

Download this manual as a PDF file

This section describes how to monitor device availability and latency.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

Availability

Availability means a device's ability to accept connections and data from the network. During polling, a device has two possible availability values:

  • 100%. Device is up and running.
  • 0%. Device is not accepting connections and data from the network.

By default, the method SL1 uses to monitor availability of the device is determined by the first method of discovery:

  • If the SL1 agent is installed and creates a device record before the device is discovered as an SNMP or pingable device, availability is measured based on whether the agent is reporting data to SL1.
  • If the device is discovered as an SNMP or pingable device before the agent is installed, availability is measured based on the method used to discover the device (SNMP, ICMP, or TCP).

If a device or interface becomes unavailable multiple times in a specified time frame, SL1 can generate an "availability flapping" event. By default, SL1 generates an event if a device becomes unavailable three times in an hour, or if an interface becomes unavailable three times in twenty-four hours.

To generate availability reports, SL1 must be configured to collect availability and latency data from devices. The following section describes how to configure SL1 to collect this data.

NOTE: Unlike for hardware-based devices, SL1 does not use ICMP, TCP, or UDP to monitor availability for component devices. Component Devices use a Dynamic Application collection object to measure availability. SL1 polls component devices for availability at the frequency defined in the Dynamic Application.

Configuring Availability Monitoring on a Device

SL1 uses ports to monitor a device's availability. You specify which ports to use for device availability in the Device Properties page.

NOTE: Unlike for hardware-based devices, SL1 does not use ICMP, TCP, or UDP to monitor availability for component devices. Component devices use a Dynamic Application collection object to measure availability. SL1 polls component devices for availability at the frequency defined in the Dynamic Application. For more information, see the section Configuring Availability for Component Devices.

To configure availability monitoring for a device:

  1. Go to the Device Manager page (Devices > Device Manager).
  2. In the Device Manager page, find the device for which you want to configure availability monitoring. Click its wrench icon (). The Device Properties page displays.
  1. In the Device Properties page, edit the following fields:
  • Availability Port . Specifies the protocol (first drop-down menu) and specific port (second drop-down menu) that SL1 should monitor to determine if the device is available. The list of ports will contain all the ports discovered by SL1. The data collected from this port will be used in device availability reports. Protocol options include:
  • TCP. Availability is based on whether SL1 can connect to the device using the specified TCP port.
  • ICMP. Availability is based on whether the device responds to an ICMP ping request from SL1. If you select ICMP as the protocol, you can use the ICMP Availability Thresholds fields in the Device Thresholds page to further define how SL1 will test the device's availability.
  • SNMP. Availability is based on whether the device responds to an SNMP GET request from SL1.
  • ScienceLogic Agent. Availability is based on whether the SL1 Agent is reporting data to SL1. The agent must be installed on the device to use this option.

  • Avail + Latency Alert. Specifies how SL1 should respond when the device fails an availability check, a latency check, or both. These options allow you to create separate events when SNMP fails on a device and when a device is not up and running (indicated by the device failing both the availability check and the latency check). Choices are:
  • Enabled. SL1 will create the following events:
    • If the device fails the availability check, generates the event "Device Failed Availability Check: UDP - SNMP".

    • If the device fails the latency check, generates the event, "Network Latency Exceeded Threshold: No Response".
    • If the device fails both the availability check and the latency check, generates the event "Device Failed Availability and Latency checks".

  • Disabled. SL1 will create the following events:
    • If the device fails the availability check, generates the event "Device Failed Availability Check: UDP - SNMP".
    • If the device fails the latency check, generates the event, "Network Latency Exceeded Threshold: No Response".
    • If the device fails both the availability check and the latency check, generates the Major event "Device Failed Availability Check: UDP - SNMP". The Minor event "Network Latency Exceeded Threshold: No Response" is rolled up under the availability event.

  1. Click Save.

NOTE: The Ping & Poll Timeout (Msec) setting in the Behavior Settings page (System > Settings > Behavior) affects how SL1 monitors device availability. This field specifies the number of milliseconds the discovery tool and availability polls will wait for a response after pinging a device. After the specified number of milliseconds have elapsed, the poll will timeout.

Defining Availability Thresholds

SL1 allows you to define global Availability Thresholds that apply to all devices and device-specific Availability Thresholds that apply to a selected device. When a device fails to meet the availability threshold (that is, is not available as specified in the threshold), SL1 generates an event about the device.

For details on defining availability thresholds, see the section on Thresholds and Data Retention.

NOTE: Unlike for hardware-based devices, SL1 does not use ICMP, TCP, or UDP to monitor availability for component devices. Component Devices use a Dynamic Application collection object to measure availability. SL1 polls component devices for availability at the frequency defined in the Dynamic Application.

Configuring Availability for Component Devices

Dynamic Applications that create component devices have the Component Mapping checkbox selected in the Dynamic Applications Properties Editor page and also include the Component Identifiers field.

In the Component Identifiers field, you map the value of a collection object to the Device Name identifier and Unique Identifier identifier, so SL1 can create one or more component devices.

In the Component Identifiers field, you can also map a collection object to the Availability identifier. For hardware-based devices, SL1 monitors an ICMP, TCP, or UDP port to determine availability. Because component devices might not include ICMP, TCP, or UDP ports, you must use a Component Identifier to determine availability.

To configure SL1 to monitor availability for a component device:

  1. Go to the Dynamic Applications Manager page (System > Manage > Dynamic Applications).
  2. Find the Dynamic Application that creates and monitors the component devices you are interested in. Click its wrench icon().
  3. In the Dynamic Applications Properties Editor page, examine the Component Mapping checkbox. If the checkbox is selected, this is the correct Dynamic Application to edit.
  4. Click the Collections tab.
  1. In the list of Collection Objects in the Collection Object Registry pane, determine which collection object will always be available if the component device is available. Click on the wrench icon () for that collection object.
  2. In the Component Identifiers field, select:
  • Availability. Object that specifies whether a component device is available. If SL1 can collect a value for a component device using the aligned collection object and the value is not 0 (zero) or "false", SL1 considers the component device as "available". If SL1 cannot collect a value for a component device using the aligned collection object or SL1 collects a value that is 0 (zero) or "false", SL1 considers the component device as "unavailable".
  • If the collection objects aligned with the Device Name and Unique Identifier component identifiers return lists of values, SL1 will create multiple component devices. Each component device will be associated with an index, i.e. a location in the list of values. If all the component devices in the list should be considered available, the collection object aligned with the Availability component identifier should return a list of values with a value at each index associated with a component device. A component device is unavailable when the list of values returned by the collection object aligned with the Availability component identifier does not include a value at the index or returns a value of 0 (zero) or false at the index for the component device. For more information about Dynamic Application indexing, see the Dynamic Application Development section.
  • If you align a collection object with this component identifier, SL1 will create a system availability graph for each component device in the Device Performance page.
  • If you align a collection object with this component identifier and SL1 cannot collect a value for a component device using the aligned collection object SL1 will supply the Value "Unavailable" in the Collection State column in the Device Components page.
  1. Click Save. SL1 will now monitor availability and graph availability statistics for the component devices aligned with the Dynamic Application.

Critical Ping

Critical Ping is a tool that allows you to monitor a device as frequently as every five seconds. If the device does not respond, SL1 creates an event. You can enable or disable critical ping for a device from its Device Properties page (Registry > Devices > wrench icon).

SL1 does not use critical ping to create device-availability reports. SL1 will continue to collect device-availability data only every five minutes, as specified in the process "Data Collection:Availability" in the Process Manager page (System > Settings > Admin Processes).

Critical Ping uses the following global default values:

  • Ping Count. This field specifies the number of packets that should be sent during each critical ping. The default value is "1".
  • Required Ping Percentage. This field specifies the percentage of packets that must be returned during a critical ping before SL1 considers the device available. The default value is "100%".
  • Packet Size. This field specifies the size of each packet, in bytes, that is sent during each critical ping. The default value is "56 bytes".

To adjust these global values or to allow Critical Ping to inherit the per-device values for ICMP Availability Thresholds defined in the in the Device Thresholds page (Registry > Devices > Device Manager > wrench icon > Thresholds), contact ScienceLogic Customer Support.

To define critical ping for a device:

  1. Go to the Device Manager page (Devices > Device Manager).
  2. In the Device Manager page, find the device for which you want to configure availability monitoring. Click its wrench icon (). The Device Properties page displays.
  1. In the Device Properties page, edit the following fields:

  • Critical Ping. Frequency with which SL1 should ping the device in addition to the five minute availability poll. If the device does not respond, SL1 creates an event. The choices are:
  • Disabled. SL1 will not ping the device in addition to the five minute availability poll.
  • Intervals from every 120 seconds - every 5 seconds.

NOTE: SL1 does not use this ping data to create device-availability reports. SL1 will continue to collect device availability data only every five minutes, as specified in the process "Data Collection:Availability" in the Process Manager page (System > Settings > Admin Processes).

NOTE: Because high-frequency data pull occurs every 15 seconds, you might experience up to 15 seconds of latency between an unavailable alert and that alert appearing in the Database Server if you set Critical Ping to 5 seconds.

TIP: You might experience some performance issues if you have a large number of devices using critical ping on a short polling interval. If you have a large number of devices and are experiencing a delay in events being generated for a critical ping outage, try increasing the interval time.

  1. Click Save.

Latency

Latency means the amount of time it takes SL1 to communicate with a device. Specifically, latency refers to the amount of time between when SL1 initiates communication with a device and when the device responds and allows communication. Latency is expressed in milliseconds (ms).

The latency calculation that is reported in SL1 varies based on the method used to check it:

  • For TCP, SL1 reports half of the time it takes for the connection to be opened.

  • For ICMP, SL1 reports half of the round-trip time for a ping.

  • For UDP, SL1 reports half of the time it takes to call getnext on .1.3.6.1 and receive a response.

SL1 uses ports to monitor a device's latency. You specify which ports to use for device latency on the Settings tab of the Device Investigator page (or the Device Properties page in the classic SL1 user interface).

Configuring Latency Monitoring on a Device

SL1 uses ports to monitor a device's latency. You specify which ports to use for device latency in the Device Properties page.

To configure latency monitoring for a device:

  1. Go to the Device Manager page (Devices > Device Manager).
  2. In the Device Manager page, find the device for which you want to configure latency monitoring. Select its wrench icon ().
  3. The Device Properties page appears.
  4. In the Device Properties page, edit the following fields:
  • Latency Port. Specifies the protocol (first drop-down menu) and specific port (second drop-down menu) SL1 should monitor to determine latency for the device. The list of ports will contain all the ports discovered by SL1. The data collected from this port will be used in device latency reports.
  • If you select ICMP as the protocol, you can use the ICMP Availability Thresholds in the Device Thresholds page to further define how SL1 will test the device's latency.

  • Avail + Latency Alert. Specifies how SL1 should respond when the device fails an availability check, a latency check, or fails both. These options allow you to create separate events when SNMP fails on a device and when a device is not up and running. Choices are:
  • Enabled. SL1 will create the following events:
    • If the device fails the availability check, generates the event "Device Failed Availability Check: UDP - SNMP".
    • If the device fails the latency check, generates the event, "Network Latency Exceeded Threshold: No Response".
    • If the device fails both the availability check and the latency check, generates the event "Device Failed Availability and Latency checks".
  • Disabled. SL1 will create the following events:
    • If the device fails the availability check, generates the event "Device Failed Availability Check: UDP - SNMP".
    • If the device fails the latency check, generates the event, "Network Latency Exceeded Threshold: No Response".
    • If the device fails both the availability check and the latency check, generates only the event "Device Failed Availability Check: UDP - SNMP". The event "Network Latency Exceeded Threshold: No Response" is suppressed under the availability event.

Defining Latency Thresholds

SL1 allows you to define global Latency Thresholds that apply to all devices and device-specific Latency Thresholds that apply only to a specific device. When a device fails to meet the latency threshold (that is, takes longer than the specified time-span to respond), SL1 generates an event about the device. For example, if the latency threshold is "100 ms", when a device does not respond to a poll within 100 ms, SL1 will generate an event about that device.

To disable the latency threshold for a single device, set the threshold to 0% (zero percent). When you disable a threshold, SL1 does not generate an event for the threshold.

For details on defining latency thresholds, see the section on Thresholds and Data Retention.

Viewing Reports on Device Availability and Device Latency

See the section on Viewing Performance Graphs for information and examples of reports for device availability and device latency.