Collector Group Configurations

For Distributed SL1 Systems, a collector group is a group of Data Collectors. Data Collectors retrieve data from managed devices and applications. This collection occurs during initial discovery, during nightly updates, and in response to policies and Dynamic Applications defined for each managed device. The collected data is used to trigger events, display data in SL1, and generate graphs and reports.

Grouping multiple Data Collectors allows you to:

Create a load-balanced collection system, where you can manage more devices without loss of performance. At any given time, the Data Collector with the lightest load handles the next discovered device.
Create a redundant, high-availability system that minimizes downtime should a failure occur. If a Data Collector fails, another Data Collector is available to handle collection until the problem is solved.

This section includes the following topics:

Collector Groups

In a Distributed SL1 System, the Data Collectors and Message Collectors are organized as Collector Groups. Each monitored device is aligned with a Collector Group:

A Distributed SL1 system must have one or more Collector Groups configured. The Data Collectors included in a Collector Group must have the same hardware configuration.

A Distributed SL1 system could include Collector Groups configured using each of the possible configurations. For example, suppose an enterprise has a main data center that contains most of the devices monitored by the SL1 system. Suppose the enterprise also has a second data center where only a few devices are monitored by the SL1 system. The SL1 system might have two collector groups:

In the main data center, a Collector Group configured with high availability that contains multiple Data Collectors and Message Collectors.
In the second data center, a Collector Group that contains a single Data Collector that is also responsible for message collection.

"Traditional" and "PhoneHome" Collectors

SL1 supports two methods for communication between Database Servers and the Data Collectors and Message Collectors in a system:

The traditional method. The Database Server initiates communication with each Data Collector and Message Collector. The Database Server periodically pushes configuration data to the Data Collectors and Message Collectors and retrieves data from the Data Collectors and Message Collectors. The collector administrator must allow ingress communication from the Database Server on port 7707. The communication is encrypted using SSL whenever possible.

The benefit of the traditional method is that communication to the Database Server is extremely limited, so the Database Server remains as secure as possible.

The PhoneHome method.The Data Collectors and Message Collectors initiate communication with the Database Server. The Data Collectors and Message Collectors create an SSH tunnel. The Database Server uses the SSH tunnel to periodically push configuration data to the Data Collectors and Message Collectors and retrieve data from the Data Collectors and Message Collectors.

The benefits of this method are that no firewall rules must be added on the network that contains the Data Collectors, and no new TCP ports are opened on the network that contains the Data Collectors.

The PhoneHome configuration uses public key/private key authentication to maintain the security of the Database Server. Each Data Collector is aligned with an SSH account on the Database Server and uses SSH to communicate with the Database Server. Each SSH account on the Database Server is highly restricted, has no login access, and cannot access a shell or execute commands on the Database Server.

Using a Data Collector for Message Collection

To use a Data Collector for message collection, the Data Collector must be in a collector group that contains no other Data Collectors or Message Collectors.

NOTE: When a Data Collector is used for message collection, the Data Collector can handle fewer inbound messages than a dedicated Message Collector.

Using Multiple Data Collectors in a Collector Group

A Collector Group can include multiple Data Collectors to maximize the number of managed devices. In this configuration, the Collector Group is not configured for high availability:

In this configuration:

All Data Collectors in the Collector Group must have the same hardware configuration
If you need to collect syslog and trap messages from the devices aligned with the Collector Group, you must include a Message Collector in the Collector Group. For a description of how a Message Collector can be added to a Collector Group, see the Using Message Collection Units in a Collector Group section.
SL1 evenly distributes the devices monitored by a collector group among the Data Collectors in the collector group. Devices are distributed based on the amount of time it takes to collect data for the Dynamic Applications aligned to each device.
Component devices are distributed differently than physical devices; component devices are always aligned to the same Data Collector as its root device.

NOTE: If you merge a component device with a physical device, the SL1 system allows data for the merged component device and data from the physical device to be collected on different Data Collectors. Data that was aligned with the component device is always collected on the Data Collector for its root device. If necessary, data aligned with the physical device can be collected on a different Data Collector.

How Collector Groups Handle Component Devices

Collector Groups handle component devices differently than physical devices.

For physical devices (as opposed to component devices), after the SL1 system creates the device ID, the SL1 system distributes devices, round-robin, among the Data Collectors in the specified Collector Group.

Each component device must use the same Data Collector used by its root device. For component devices, the SL1 System must keep all the component devices on the same Data Collector used by the root device (the physical device that manages the component devices). SL1 cannot distribute the component devices among the Data Collectors in the specified Collector Group.

NOTE: If you merge a component device with a physical device, the SL1 System allows data for the merged component device and data from the physical device to be collected on different Data Collectors. Data that was aligned with the component device is always collected on the Data Collector for its root device. If necessary, data aligned with the physical device can be collected on a different Data Collector.

High Availability for Data Collectors

To configure a Collector Group for high availability, the Collector Group must include multiple Data Collectors:

In this configuration:

All Data Collectors in the Collector Group must have the same hardware configuration.

If you need to collect syslog and trap messages from the devices monitored by a high availability Collector Group, you must include a Message Collector in the Collector Group. For a description of how a Message Collector can be added to a Collector Group, see the Using Message Collection Units in a Collector Group section.
Each collector group that is configured for high availability includes a setting for Maximum Allowed Collector Outage. This setting specifies the number of Data Collectors that can fail and data collection will continue. If more Data Collectors than the specified maximum fail simultaneously, some or all monitored devices will not be monitored until the failed Data Collectors are restored.

If a collector group is configured for high availability and the number of failed Data Collectors in that collector group becomes greater than the Maximum Allowed Collector Outage setting, SL1 will not failover within the Collector Group. SL1 will not collect or store any data from the devices aligned with the failed Data Collector(s) until the failure is fixed, and SL1 will generate a critical event. This is true regardless of whether the individual Data Collectors are able to collect data.

In this example, the Collector Group includes four Data Collectors. The Collector Group is configured to allow for an outage of two Data Collectors.

When all Data Collectors are available, the SL1 System evenly distributes the devices monitored by a Collector Group among the Data Collectors in that Collector Group. In this example, there are 200 devices monitored by the Collector Group, with each of the four Data Collectors responsible for collecting data from 50 devices. For simplicity, this example assumes that SL1 spends the same amount of time collecting Dynamic Application data from every device; therefore, the devices are divided evenly across the four collectors.

If one of the Data Collectors in the example Collector Group fails, the 50 devices that the Data Collector was monitoring are redistributed evenly between the other three Data Collectors:

If a second Data Collector in the example Collector Group fails, the 50 devices that the Data Collector was monitoring are redistributed evenly between the other two Data Collectors:

If a third Data Collector in the example Collector Group fails, the Collector Group has exceeded its maximum allowable outage. Until one of the three failed Data Collectors becomes available, 100 devices are not monitored:

Using Message Collectors in a Collector Group

If you need to collect syslog and trap messages from the devices monitored by a Collector Group that includes multiple Data Collectors, you must include a Message Collector in the Collector Group:

If your monitored devices generate a large amount of syslog and trap messages, a Collector Group can include multiple Message Collectors:

In this configuration, a monitored device can send syslog and trap messages to either Message Collector.

NOTE: Each syslog and trap message should be sent to only one Message Collector.

A third-party load-balancing solution can be used to distribute syslog and trap messages evenly among the Message Collectors in a Collector Group:

NOTE: ScienceLogic does not recommend a specific product for this purpose and does not provide technical support for configuring or maintaining a third-party load-balancing solution.

One or more Message Collectors can be included in multiple Collector Groups:

In this configuration, each managed device in Collector Group A and Collector Group B must use a unique IP address when sending syslog and trap messages. The IP address used to send syslog and trap messages is called the primary IP. For example, if a device monitored by Collector Group A and a device monitored by Collector Group B use the same primary IP address for data collection, one of the two devices must be configured to use a different IP address when sending syslog and trap messages.

A Collector Group can have multiple Message Collectors that are also included in other Collector Groups. It is possible to include every Message Collector in your SL1 System in every Collector Group in your SL1 System.