Example Using Device Availability and Interface Monitoring

Download this manual as a PDF file

This section describes an example of an IT Service policy.

This example monitors a leased WAN circuit. The example IT Service policy includes two routers that are at the two ends of the circuit.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

Creating an IT Service Policy

To define an IT Service policy, you must:

  • Define a service name and basic properties. In this example, we will monitor the routers at both ends of a leased WAN circuit. The name of the IT Service policy will be "wan_circuit_1".
  • Define a list of devices (model) for the IT Service that includes all the devices associated with the IT Service. For example, if you want to monitor a WAN circuit, you could create a device group that includes the routers at each end of the circuit. You could create another device group that includes the switches that are connected to those routers. In our example, we'll select two routers to monitor.
  • Optionally, define service sets. A service set is a sub-group of devices. You can manually assign devices to a service set, or you can use membership rules, like you would for a dynamic device group. For example, you could define two service sets: Exchange Servers, defined by device class, and DNS servers, defined by the DNS Server running on each device. We don't use service sets in this example.
  • Define Interface Tags for Interface Metrics. If your IT Service policy will include interface metrics, you can use interface tags to create groups of interfaces. You can then apply a metric to a group of interfaces. Each interface can belong to multiple interface tags.
  • Define metrics. A metric is based on your business processes and examines all devices or one or more service sets to evaluate the state of the IT Service. For each IT Service, SL1 provides a default metric called Average Device Availability, based on the availability of all devices in the IT Service. You can define additional metrics, based on default data collected by SL1 (availability, latency, CPU usage, memory usage, swap usage, device state, and device count), data collected by a Dynamic Application, and data about network interfaces, TCP/IP ports, system processes, Windows services, Email round-trip time, web content, SOAP/XML transactions, and DNS availability. For our example, we'll examine the traffic and errors on the interfaces on the routers at each end of the WAN circuit.

NOTE: When SL1 evaluates a metric, it performs an aggregation, that is, SL1 evaluates the data for all devices specified in the definition of the metric, over a specified time period (the Aggregation Frequency). Depending on the definition of the metric, SL1 calculates the average, maximum, minimum, sum, standard deviation, or count value for all devices specified in the definition of the metric.

  1. Define Key Metrics. Key Metrics are the standard method for describing the status of an IT Service. Key Metrics allow you to quickly gauge the status of multiple IT Services, even if those IT Services require very different metrics that aggregate very different performance data.The Key Metrics are Health, Availability, and Risk. When you define a Key Metric, you are specifying how the value for a metric you created in step 4 translates to one of the standard Key Metric values. By default, all three Key Metrics are based on the default Average Device Availability metric.
  2. Define alerts and associated events. Each alert and its associated event is triggered by a metric. In our example, we will define alerts for each metric.

Defining the Name of the IT Service Policy and its Basic Properties

To define the basic parameters of our example IT Service policy:

  1. Go to the IT Service Manager page (Registry > IT Services > IT Service Manager).
  2. Select the Create button.
  3. The IT Service Editor page appears, with the Administration tab selected.
  4. Select the Properties sub-tab. Supply the following values in the following fields:
  • IT Service Name. Name of the IT Service policy. We entered "wan_circuit_1"
  • IT Service Owner. Automatically populated with your username.
  • Configuration Mode. We selected Basic Interface. The Basic Interface allows you to quickly setup an IT Service policy.
  • Sharing Permissions. Specifies whether other users can view and use the IT Service policy, in both the IT Service Manager page, IT Service Editor page, and in the pages in SL1 where the IT Service is visible. We selected Shared with users in your organization. The IT Service policy can be viewed and used by other users who belong to the same organization as the creator.
  • Permission Keys. We did not select any permission keys.
  • Operational Status. We selected Aggregation enabled.
  • Aggregation Frequency. Frequency at which SL1 will collect data from all devices in the IT Service and "crunch" the data for each metric into a single value. We specified Every 2 minutes.
  • Raw Data Retention. Specifies how long SL1 should store the raw data for the IT Service Policy. We accepted the default value.
  • Frequent Rollup Retention. Deprecated field no longer used by SL1.
  • Hourly Rollup Retention. Specifies how long SL1 should store the "hourly" normalized data for the IT Service policy. We accepted the default value.
  • Daily Rollup Retention. Specifies how long SL1 should store the "daily" normalized data for the IT Service policy. We accepted the default value.
  • Description. We did not enter a description.
  1. Select the Save button to save the values in the Properties tab.

Defining a List of Devices for the IT Service Policy

After defining the name and basic properties of an IT Service Policy, you must next determine the devices to include in your IT Service policy. You do this in the Model sub-tab.

For example, if you want to monitor Email service, you could create a list of devices that includes Exchange servers, DNS servers, and devices that run Email round-trip policies.

You can manually assign devices and device groups to the IT Service device group, or you can use membership rules, like you would for a dynamic device group.

When you define the list of devices to include in your IT Service policy, that list of devices appears as a device group throughout SL1.

There are three ways to add a device to the list of devices for the IT Service policy.

  • Add a device group to the list of devices for the IT Service policy.
  • Add a static list of one or more devices to the list of devices for the IT Service policy.
  • Add a dynamic list of one or more devices to the list of devices for the IT Service policy.

In our example, we will create a static list of devices.

To create the list of devices for the IT Service policy:

  1. After performing the tasks in the previous section, select the Model sub-tab.
  2. We will statically add two routers to our policy. To add a static list of one or more device to the list of devices for the IT Service policy, go to the Static Devices pane.
  3. Select the Add button. The Device Alignment modal page displays a list of all devices in SL1.
  4. In the Device Alignment modal page, we selected the checkbox for devices "10.20.30.148" and "10.20.30.149". Each is a Netopia router with one WAN (ATM) interface and one Ethernet interface.
  5. Select the Add/Remove button in the lower right. The selected devices will appear in the Static Devices pane.
  6. Select the Save button to save the list of devices.

Defining Interface Tags for Interface Metrics

You can define interface metrics that monitor the following:

  • Inbound Traffic
  • Outbound Traffic
  • Inbound Errors
  • Outbound Errors
  • Inbound Discards
  • Outbound Discards

You can apply these metrics to:

  • All Interfaces
  • Management Interface
  • Tagged Interfaces

Interface Tags allow you to create one or more groups of interfaces. You can then apply an interface metric to that group. If All Interfaces or Management Interface doesn't suit your needs, you can define and apply interface tags.

In our example, we will create an interface tag for the interfaces at each end of the WAN link. When we define a metric, we will specify that interface tag to include only data from the interfaces at each end of the WAN link. To create the interface tags:

  1. Go to the Device Manager page (Registry > Devices > Device Manager). Find the device where the interface resides. In our example, we'll search for 10.20.30.148 and 10.20.30.149.
  2. In the Device Manager page, find the first device (10.20.30.148). In the IP Address column, click on the interface icon (). The Device Interfaces page appears.
  3. In our example, we will add the tag wan_link to the WAN interface on 10.20.30.148. Click on the WAN interface. The Interface Properties page appears.
  4. In the Interface Properties page, find the Interface Tags field. Select the wrench icon () to the right of the field.
  5. The Edit Network Interface Tags modal page appears. In this page, enter the following:
  • Tags (comma separated). Enter wan_link.
  • Select the Save button.
  • Select the Close button.
  • In the Interface Properties page, notice that the Interface Tags field contains the entry wan_link.
  • Repeat steps 1-6 for the WAN interface on the second device (10.20.30.149).

Defining Metrics for the IT Service Policy

A metric is a measurement that helps determine the status of an IT Service.

SL1 automatically includes a default metric with each IT Service policy. The default metric is called Average Device Availability. The Average Device Availability metric examines the availability of all devices in the IT Service. By default, the Average Device Availability metric is collected from every device every minute and "crunched" and averaged every 15 minutes.

Before you can define additional metrics for an IT Service policy, you must determine what parameters you want to monitor for the IT Service policy. You can use data from the following sources to monitor the IT Service:

  • Device Availability
  • Device Latency
  • Overall CPU Usage
  • Physical Memory Usage
  • Swap Usage
  • Device State (Condition of the device, based upon the most severe event generated by the device.)
  • Device Count
  • Presentation Objects from Dynamic Applications
  • Network Interface
  • TCP/IP Port Monitor
  • System Process Monitor
  • Windows Service Monitor
  • Email Round Trip Monitor
  • Web Content Monitor
  • SOAP/XML Transaction Monitor
  • Domain Name Monitor

Our example uses data from Network Interface monitoring. We will create our metrics in Basic mode. We will create a metric called wan_inbound_errors that will examine specified (tagged) interfaces for errors. We will also define an alert that tells SL1 to generate an event when the interfaces exceed the threshold for acceptable errors.

  1. After performing the tasks in the previous sections, select the Metrics sub-tab.
  2. Ensure that you are in Basic mode. If you see the Alerting sub-tab, you are not in Basic mode. Click on the Advanced button to toggle to Basic mode.
  3. Select the Add button.
  4. The Service Metric Editor modal page appears.
  5. We will create the metric wan_inbound_errors. This metric will measure the number of inbound errors on the interfaces at each end of the WAN link. To create this metric, enter the following values in the Service Metric Editor modal page:
  • Service Metric Name. Enter "wan_inbound_errors".
  • Metric Classification. Specifies whether the metric will be displayed in the IT Service Summary page in widgets that display vital metrics. Select Service Vital Metric. The metric will appear in widgets that display vital metrics.
  • Active State. Specifies whether SL1 should currently collect data for the metric and evaluate alerts for the metric. Select Enabled.
  • Metric Type. Specifies the type of performance data you want to use for the metric. Select Network Interface. Our metric will examine data from network interfaces.
  • Device Subset. We have not define any device subsets. Select All Devices in Service.
  • Aggregation. Specifies how SL1 will aggregate ("crunch") the data collected from all the devices in the IT Service into a single value. Select Average.

  • Show only metrics available for this IT Service. Leave this checkbox unselected. This checkbox filters the succeeding fields so that they display already-defined policies aligned with one or more of the devices in the IT Service or in the specified Device Subset. For example, if you selected Dynamic App in the Metric Type field, and then selected this checkbox, the Dynamic Application field would display only Dynamic Applications that are already aligned with one or more of the devices in the IT Service or in the specified Device Subset.

  • Interface Selection. Select the network interfaces to include in the calculation for this metric. Select Tagged Interfaces. To calculate a value for this metric, SL1 should aggregate interface utilization statistics from the interfaces that are associated with a specific tag on all the devices in the IT Service.
  • Interface Tag. Appears if you selected Network Interface in the Metric Type field. Select wan_link. This is the interface tag that we assigned to the interfaces at each end of the WAN circuit.
  • Interface Metric. Select the interface measurement that SL1 should use to calculate the value for this metric. Select Inbound Errors. To calculate a value for the metric, SL1 aggregates the value for this interface measurement from all interfaces that you included in this metric using the method specified in the Aggregation field (Average, Minimum, Maximum, Sum, Standard Deviation, or Device Count).
  1. Select the Save button to save your new metric.

Defining Alerts for the IT Service Policy

For each metric in an IT Service policy, you can define an associated alert and event. In our example, we will create an alert for the metric we created in the previous section. The alert will trigger an event when the inbound errors on the development router exceed the threshold of acceptable errors.

  1. If the IT Service Editor page is not still open, go to the IT Service Manager page (Registry > IT Services > IT Service Manager). Find the policy WAN_circuit_1. Select its wrench icon ().
  2. In the IT Service Editor page, select the Metrics sub-tab.
  3. Ensure that you are in Basic mode. If you see the Alerting sub-tab, you are not in Basic mode. Click on the Advanced button to toggle between Basic mode and Advanced mode.
  4. In the Service Metrics Definitions pane, find the metric wan_inbound_errors. Select its wrench icon ().
  5. In the Service Metric Editor modal page, go to the bottom pane. We will use the fields in the bottom pane to define an optional alert and optional event associated with the metric.
  6. Enter values in the following fields:
  • Alert Policy Name. Enter "dev_router_too_many_inbound_errors". SL1 will automatically create an event policy that corresponds to this alert. This name will appear in the name of the event policy.
  • Event Severity. When the alert is generated, SL1 will trigger an event with the selected event severity. Select Critical.
  • Decreasing/Increasing. Toggles whether the alert is triggered when the value for the metric is above a specific threshold (Increasing) or below a specific threshold (Decreasing). Select Increasing.
  • Alert Threshold. Use sliders to define the threshold at which the alert should be generated and trigger an event and the threshold at which the alert should be reset and no longer trigger an event. Select 25.
  • Alert Range. Accept the default values.
  • Event Policy Description. Optionally enter cause and resolution text for the event. The text you supply in this field will be used to populate the Policy Description field in the Event Policy Manager for the event. If this event is triggered, the text you supply in this field will be displayed in the Event Information modal page for the event.
  1. Select the Save button to save your new alert.

Defining Key Metrics for the IT Service Policy

Key Metrics are the standard method for describing the status of an IT Service. Key Metrics allow you to quickly gauge the status of multiple IT Services, even if those IT Services include metrics that aggregate very different performance data. For example, you can define "health" for a remote backup service and also define "health" for an Internet bandwidth service, even though you would use different criteria to measure the health of those two services.

All IT Service policies define how SL1 should calculate the following Key Metrics for the IT Service:

NOTESL1 automatically includes a default metric with each IT Service policy. The default metric is called Average Device Availability. The Average Device Availability metric specifies that SL1 should aggregate the availability data for all the devices in the policy and calculate the average availability.

  • Service Health. The health of an IT Service can be one of the five standard severity values: Healthy, Notice, Minor, Major, or Critical. By default, the Service Health metric is aligned with the Average Device Availability metric.
  • Service Availability. The availability of an IT Service can be either available or unavailable. By default, the Service Availability metric is aligned with the same metric as Service Health, converting Critical Service Health to Unavailable and all other Service Health values to Available.
  • Service Risk. The risk of an IT Service is a percentage value that indicates how close an IT Service is to being in an undesirable state. By default, the Service Risk metric is aligned with the same metric as Service Health, converting the threshold between Healthyand Notice Service Health to 100% and the healthiest possible value to 0%.

For more details on Key Metrics, see the main section on Key Metrics.

To edit the definitions of each Key Metric for our example IT Service policy:

  1. If the IT Service Editor page is not still open, go to the IT Service Manager page (Registry > IT Services > IT Service Manager). Find the policy WAN_circuit_1. Select its wrench icon ().
  2. In the IT Service Editor page, select the Metrics sub-tab.
  3. In the bottom pane, you will see the three Key Metrics:
  4. To edit the Key Metrics for our example IT Service policy:
  • Service Health. This example uses the default values for this Key Metric. This Key Metric appears in the Health column in the IT Service Manager page (Registry > IT Services > IT Service Manager). Possible values are Healthy, Notice, Minor, Major, and Critical.
  • Service Availability. This Key Metric appears in the Availability column in IT Service Manager page (Registry > IT Services > IT Service Manager). Possible values are Available and Unavailable.
  • From the drop-down list that appears above the Service Availability Key Metric, select wan_inbound_errors. The Service Availability Key Metric will now examine the metric wan_inbound_errors to determine the availability of the IT Service.
  • From the drop-down list that appears to the right of the Service Availability Key Metric, select Increasing.
  • Move the slider to 25. If there are more than 25 errors, the service will be considered unavailable.
  • Accept the default minimum range and maximum range.
  • Service Risk. This Key Metric appears as a percentage in the Risk column in the IT Service Manager page (Registry > IT Services > IT Service Manager). Possible values are 0% - 100%.
  • From the drop-down list that appears above the Service Risk Key Metric, select wan_inbound_errors. The Service Risk Key Metric will now examine the metric wan_inbound_errors to determine the risk to the IT Service.
  • From the drop-down list that appears to the right of the Service Risk Key Metric, select Increasing.
  • Move the 0% slider to 0. Move the 100% slider to 25. The Service Risk metric will now show how at risk the service is, with 0% risk being completely healthy (no errors) and 100% risk being unavailable (25 errors).
  • Accept the default minimum range and maximum range.
  1. Select the Save button to save the changes to the Key Metrics.

Viewing Information about the IT Service Policy

IT Service Manager

The IT Service Manager page displays overview information each IT Service policy. To view the IT Service Manager page:

  1. Go to the IT Service Manager page (Registry > IT Services > IT Service Manager).
  2. Find the policy WAN_circuit_1.
  3. The IT Service Manager displays the following about the IT Service policy.
  • Service Name. Name of the policy. The color indicates the severity of the most severe event associated with the IT Service policy.
  • Health. This is a default Key Metric for each IT Service policy. This metric specifies the overall health of the IT Service. Possible values are: Critical, Major, Minor, Notice, and Healthy.
  • Availability. This is a Key Metric for each IT Service policy. This metric specifies the overall availability of the IT Service. Possible values are: Available or Unavailable.
  • Risk. This is a Key Metric for each IT Service policy. This metric specifies the overall risk to the IT Service. Possible values are 0% - 100%, in integer values

IT Service Summary

The IT Service Summary page allows you to view the IT Service Dashboards that have been configured for the selected IT Service. By default, each IT Service Policy includes the IT Service Details dashboard. To access the IT Service Summary page:

To view the IT Service Summary page:

  • Go to the IT Service Manager page (Registry > IT Services > IT Service Manager).
  • Find the policy WAN_circuit_1. Select its map icon ().
  • The IT Service Summary page for our example IT Service policy displays the default IT Service Details dashboard for the IT Service.

Viewing Additional Information

For instructions on how to view information about the devices in an IT Service policy, view the events associated with an IT Service policy, view the tickets associated with an IT Service, and view the log messages associated with an IT Service policy, see the section on Viewing IT Services.