Introduction

Download this manual as a PDF file

This section is intended for system administrators responsible for setting up and maintaining Database Servers for Disaster Recovery or High Availability, or both configurations. The ScienceLogic SL1 platform supports redundancy at every tier such that a single failure will not result in the failure of the overall platform.

  • At the edge, Message Collectors (MC) and Data Collector units(CU) can be configured in groups to allow the load to be picked up by the remaining units should one fail.
  • With the Admin Portal (AP), multiple APs can be deployed behind a load balancer, allowing reconfiguration of load based on availability.
  • For central databases (CDB), block level replication is used to ensure multiple identical copies of the database exist.
  • With High Availability, two CDB instances are in close proximity: one as active and one as standby. This allows for automated failover if a problem with the active CDB is detected.
  • With Disaster Recovery, the two CDB instances can be located in separate data centers, but no automated failover happens if a failure of the primary CDB is detected.
  • DRDB is used as the underlying block level replication between the CDB instances. DRDB supports High Availability, Disaster

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

Disaster Recovery

You can configure SL1 to replicate data stored on a Database Server to a Disaster Recovery appliance with the same specifications. You can install the Disaster Recovery appliance at the same site as the primary Database Server (although this is not recommended) or at a different location.

If the primary Database Server fails for any reason, you must manually perform failover. Failover to the Disaster recovery appliance is not automated by SL1.

High Availability

You can cluster Database Servers in the same location to allow for automatic failover.

A cluster includes an active Database Server and a passive Database Server. The passive Database Server provides redundancy and is dormant unless a failure occurs on the active Database Server. SL1 uses block-level replication to ensure that the data on each Database Server's primary file system is identical and that each Database Server is ready for failover if necessary. If the active Database Server fails, the passive Database Server automatically becomes active and performs all required database tasks. The previously passive Database Server remains active until another failure occurs.

Each database cluster uses a virtual IP address that is always associated with the primary Database Server. No reconfiguration of Administration Portals is required in the event of failover.

High Availability for Azure deployments is supported for installations of 12.1.x and later that are running on Oracle Linux 8 (OL8). ScienceLogic recommends that customers running SL1 versions prior to 12.1.x upgrade to 12.1.x or later and then complete the High Availability setup and configuration. For more information about upgrading, see the section on Updating SL1.

Differences Between Disaster Recovery and High Availability for Database Servers

SL1 provides two solutions that allow for failover to another Database Server if the primary Database Server fails: Disaster Recovery and High Availability. There are several differences between these two distinct features:

  • Location. The primary and secondary databases in a High Availability configuration must be located together to configure the heartbeat network. In a Disaster Recovery configuration, the primary and secondary databases can be in different locations.

  • Failover. In a High Availability configuration, SL1 performs failover automatically, although a manual failover option is available. In a Disaster Recovery configuration, failover must be performed manually.
  • System Operations. A High Availability configuration maintains SL1 system operations if failure occurs on the hardware or software on the primary Database Server. A Disaster Recovery configuration maintains SL1 system operations if the data center where the primary Database Server is located has a major outage, provides a spare Database Server that can be quickly installed if the primary Database Server has a permanent hardware failure, and/or to allow for rotation of SL1 system operations between two data centers.

A Distributed SL1 system can be configured for both High Availability and Disaster Recovery.

High Availability and Disaster Recovery are not supported for All-In-One Appliances.

Configuring MX Records

In all configurations, the primary Database Server is responsible for processing all inbound email. To prevent duplicate emails from being processed, all inbound email must be delivered to the current primary Database Server only. When Disaster Recovery is configured, the mail process is configured to be running on the current primary Database Server only. When you configure your mail exchanger (MX) record for SL1:

  • Include the hostname of the Database Server that will be primary under normal conditions at the lowest MX-level (making that hostname the highest-priority).
  • If your configuration includes a High Availability cluster, include the hostname of the secondary cluster Database Server at the next lowest MX-level.
  • For Disaster Recovery, include the hostname of the Database Server at the highest MX-level (making that hostname the lowest priority).