Disaster Recovery with Two Appliances

Download this manual as a PDF file

This section describes how to configure two appliances for Disaster Recovery.

This section assumes that you are comfortable using a UNIX shell session and can use the basic functions within the vi editor.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all the menu options, click the Advanced menu icon ().

This section includes the following topics:

Prerequisites

Before performing the steps listed in this section, you must:

  • Install and license each appliance
  • Have an Administrator account to log in to the Web Configuration Utility for each appliance
  • Have SSH or console access to each appliance
  • Know the em7admin console username and password for each appliance
  • Have identical hardware or virtual machine specifications on each appliance
  • Have configured a unique hostname on each appliance
  • If the two appliances are not connected with a crossover cable, you must:
    • use a DRBD proxy license
    • know the maximum link speed, in megabytes per second, between the two appliances
  • Optionally, if the two appliances you are configuring have their primary network adapter connected to the same network subnet, you must have an available IP address to configure as a virtual IP

Unique Hostnames

You must ensure that a unique hostname is configured on each SL1 appliance. The hostname of an appliance is configured during the initial installation. To view and change the hostname of an appliance:

  1. Log in to the console of the SL1 appliance as the em7admin user. The current hostname appears before the command-prompt. For example, the login prompt might look like this, with the current hostname highlighted in bold:
  2. login as: em7admin

    em7admin@10.64.68.31's password:

    Last login: Wed Apr 27 21:25:26 2016 from silo1651.sciencelogic.local

    [em7admin@HADB01 ~]$

     

  3. To change the hostname, run the following command:
  4. sudo hostnamectl set-hostname <new hostname>

  5. When prompted, enter the password for the em7admin user.

Licensing DRBD Proxy

DRBD Proxy buffers all data between the active and redundant appliances to compensate for any bandwidth limitations. In addition, DRBD compresses and encrypts the data sent from the active appliance to the redundant appliance.

You must use DRBD Proxy if you are:

  • Configuring three appliances for High Availability and Disaster Recovery.
  • Configuring two appliances for Disaster Recovery and will not be configuring a direct connection between your appliances with a crossover cable.

Data sent from the active appliance to the redundant appliance is compressed and encrypted only if you use DRBD Proxy. DRBD without DRBD Proxy does not compress and encrypt this data.

To license DRBD Proxy, copy the drbd-proxy.license file to the /etc directory on all appliances in your system.

Using a Virtual IP Address

If the two appliances you are configuring for Disaster Recovery are connected to the same network subnet using their primary network adapters, you can optionally specify a virtual IP address during the configuration. The virtual IP address is associated with the primary appliance and transitions between the appliances during failover and failback.

If you use a virtual IP address, you do not have to reconfigure your Administration Portals after failover and failback. The virtual IP address must be on the same network subnet as the primary network adapters of the appliances.

Configuring Disaster Recovery

This section describes how to configure the Primary appliance and the Secondary appliance for Disaster Recovery.

Configuring the Primary Appliance

To configure the Primary appliance for Disaster Recovery, perform the following steps:

  1. Log in to the console of the Primary appliance as the em7admin user.

  1. Run the following command:
  2. sudo -i

     

  3. When prompted, enter the password for the em7admin user.
  4. Run the following command:
  5. coro_install

     

    The following prompt appears:

    1) HA

    2) DR

    3) HA+DR

    4) Quit

    Please select the architecture you'd like to setup:

     

  6. Enter "2". The following prompt appears:
  7. 1) Primary

    2) Secondary

    Please choose which node this is:

     

  8. Enter "1". The following prompt appears:
  9. Architecture: DR

    Server Role: Primary

    Is this information correct? (y/n)

     

  10. Enter "y". The following prompt appears:
  11. The hostname of this server is <hostname of this appliance>, is this right? (y/n)

     

  12. Enter "y". The following prompt appears:
  13. Please choose the DRBD IP for this server:

    1) <First IP address of this appliance>

    2) <Second IP address of this appliance>

    .

    .

    Number:

  1. Enter the number for the IP address of the network connection for replication on this Primary appliance. The following prompt appears:
  2. What is the hostname of the Secondary server:

  1. Enter the hostname of the Secondary appliance. The following prompt appears:
  2. Please enter the IP used for DRBD traffic for the Secondary server:

  1. Enter the IP address of the network connection for replication on the Secondary appliance. The following prompt appears:
  2. Is DRBD Proxy being used? (y/n)

     

  3. If the appliances are not directly connected using a crossover cable, you must use DRBD Proxy. If you are using DRBD proxy, enter "y". If you are not using DRBD Proxy, enter "n".
  4. If you are using DRBD Proxy, go to step 18. If you are not using DRBD Proxy, the following prompt appears:
  5. Would you like to use a Virtual IP (VIP)? (y/n)

     

  6. If you want to optionally add a virtual IP address to the Disaster Recovery configuration, enter "y". If you do not want to add virtual IP to the Disaster Recovery configuration, enter "n".
  7. If you entered "n", go to step 18. If you entered "y", the following prompt appears:
  8. Please enter the Virtual IP Address:

     

  9. If you are adding a virtual IP address to the Disaster Recovery configuration, enter the virtual IP address. The following prompt appears:
  10. Please enter the CIDR for the Virtual IP without the / (example: 24):

     

  11. If you are adding a virtual IP address to the Disaster Recovery configuration, enter the CIDR netmask of the virtual IP address.
  12. If you are not using DRBD Proxy, go to step 22. If you are using DRBD proxy, the following prompt appears:
  13. Please enter the max link speed to the DR system in megabytes/second:

     

  14. If you are using DRBD Proxy, enter the maximum link speed between the two appliances.
  15. The following prompt appears:
  16. You have selected the following settings, please confirm if they are correct:

    Architecture: DR

    Node: Primary

     

    Node 1 Hostname: <host name of this appliance>

    Node 1 DRBD IP: <DRBD IP address you entered for this appliance>

    Node 2 Hostname: <host name of the Secondary appliance>

    Node 2 DRBD IP: <DRBD IP address you entered for the Secondary appliance>

     

    DRBD Disk: <partition to be used by DBRD>

    DRBD Proxy: <whether DRBD proxy will be used>

     

    Is this information correct? (y/n)

  1. Enter "y". The following output appears:
  2. Setting up the environment...

    - Updating firewalld configuration, please be patient...

    Setting up DRBD...

    Editing Corosync config...

    Setting up Corosync...

    Complete, you can monitor the cluster status by typing 'crm_mon' (give it a minute)

     

    Coro_install completed successfully

     

    coro_install has exited

     

Configuring the Secondary Appliance

To configure the Secondary appliance for Disaster Recovery, perform the following steps:

  1. Log in to the console of the Secondary appliance as the em7admin user.

  1. Run the following command to assume root user privileges:
  2. sudo -s

     

  3. When prompted, enter the password for the em7admin user.
  4. Run the following command:
  5. coro_install

     

    The following prompt appears:

    1) HA

    2) DR

    3) HA+DR

    4) Quit

    Please select the architecture you'd like to setup:

     

  6. Enter "2". The following prompt appears:
  7. 1) Primary

    2) Secondary

    Please choose which node this is:

     

  8. Enter "2". The following prompt appears:
  9. Architecture: DR

    Server Role: Secondary

    Is this information correct? (y/n)

     

  10. Enter "y". The following prompt appears:
  11. The hostname of this server is <hostname of this appliance>, is this right? (y/n)

     

  12. Enter "y". The following prompt appears:
  13. Please choose the DRBD IP for this server:

    1) <First IP address of this appliance>

    2) <Second IP address of this appliance>

    .

    .

    Number:

  1. Enter the number for the IP address of the network connection for replication on this Secondary appliance. The following prompt appears:
  2. What is the hostname of the Primary server:

  1. Enter the hostname of the Primary appliance. The following prompt appears:
  2. Please enter the DRBD IP for the Primary server:

  1. Enter the IP address of the network connection for replication on the Primary appliance. The following prompt appears:
  2. Is DRBD Proxy being used? (y/n)

     

  3. If the appliances are not directly connected via a crossover cable, you must use DRBD proxy. If you are using DRBD proxy, enter "y". If you are not using DRBD proxy, enter "n".
  4. If you are adding a virtual IP to the Disaster Recovery configuration, enter the virtual IP address. The following prompt appears:
  5. I have detected the partition used for DRBD should be /dev/mapper/em7vg-db, is this correct? (y/n)

     

  6. Enter "y".
  7. If you are not using DRBD proxy, go to step 22. If you are using DRBD proxy, the following prompt appears:
  8. Please enter the max link speed to the DR system in megabytes/second:

     

  9. If you are using DRBD proxy, enter the maximum link speed between the two appliances.
  10. The following prompt appears:
  11. You have selected the following settings, please confirm if they are correct:

    Architecture: DR

    Node: Secondary

     

    Node 1 Hostname: <host name of this appliance>

    Node 1 DRBD IP: <DRBD IP address you entered for this appliance>

    Node 2 Hostname: <host name of the Primary appliance>

    Node 2 DRBD IP: <DRBD IP address you entered for the Primary appliance>

     

    DRBD Disk: <partition to be used by DBRD>

    DRBD Proxy: <whether DRBD proxy will be used>

     

    Is this information correct? (y/n)

     

  1. Enter "y". If proxy is not in use, the following output appears:
  2. Setting up SSH keys...

    You will be prompted to enter the password for <IP address of Primary appliance>

     

    em7admin@<IP address of Primary appliance>'s password:

  1. Enter the password for the em7admin user on the Primary appliance. The following output appears:
  2. Setting up the environment...

    - Updating firewalld configuration, please be patient...

    Setting up DRBD...

    Editing Corosync config...

    Setting up Corosync...

    Complete, you can monitor DRBD sync status by using 'cat /proc/drbd' (it can take a sec)

     

    Please license the appliance at this time WITHOUT failing over

    Failover cannot occur until DRBD is fully synced

     

    Coro_install completed successfully

     

Licensing the Secondary Appliance

Perform the following steps to license the Secondary appliance:

  1. You can log in to the Web Configuration Utility using any web browser supported by SL1. The address of the Web Configuration Utility is in the following format:

https://<ip-address-of-appliance>:7700

Enter the address of the Web Configuration Utility in to the address bar of your browser, replacing "ip-address-of-appliance" with the IP address of the Secondary appliance.

  1. You will be prompted to enter your username and password. Log in as the "em7admin" user with the password you configured using the Setup Wizard.
  2. The Configuration Utilities page appears. Click the Licensing button. The Licensing Step 1 page appears:

  • Click the Generate a Registration Key button.
  • When prompted, save the Registration Key file to your local disk.
  • Log in to the ScienceLogic Support Site at https://support.sciencelogic.com/s/. Click the License Request tab and follow the instructions for requesting a license key. ScienceLogic will provide you with a License Key file that corresponds to the Registration Key file.
  • Return to the Web Configuration Utility:

  • On the Licensing Step 2 page, click the Upload button to upload the license file. After navigating to and selecting the license file, click the Submit button to finalize the license. The Success message appears:

Upon login, SL1 will display a warning message if your license is 30 days or less from expiration, or if it has already expired. If you see this message, take action to update your license immediately.

Configuring Data Collection Servers and Message Collection Servers

If you are using a distributed system, you must configure the Data Collectors and Message Collectors to use the new multi-Database Server configuration.

To configure a Data Collector or Message Collector to use the new configuration:

  1. You can log in to the Web Configuration Utility using any web browser supported by SL1. The address of the Web Configuration Utility is in the following format:

https://<ip-address-of-appliance>:7700

Enter the address of the Web Configuration Utility in the address bar of your browser, replacing "ip-address-of-appliance" with the IP address of the Data Collector or Message Collector.

  1. You will be prompted to enter your user name and password. Log in as the em7admin user with the password you configured using the Setup Wizard.
  2. On the Configuration Utilities page, click the Device Settings button. The Settings page appears:

  1. On the Settings page, enter the following:
    • Database IP Address. Enter the IP addresses of all the Database Servers, separated by commas.
  2. Click the Save button. You may now log out of the Web Configuration Utility for that collector.
  3. Perform steps 1-5 for each Data Collector and Message Collector in your system.

Failover

If your Primary appliance fails, you can manually failover to the Secondary appliance. There are two ways to perform failover:

  • If you can access a shell session on both appliances:
  • Because DRBD does not allow two Primary appliances, you must demote the Primary appliance first during failover.
  • After demoting the Primary appliance, your system will recognize two Secondary appliances; DRBD allows two Secondary appliances. You can then promote the original Secondary appliance.
  • After promoting the original Secondary appliance, your system will have one Primary appliance and one Secondary appliance.
  • This process is described in the section Failover When Both Database Appliances are Accessible.
  • If you cannot access a shell session on your Primary appliance:
  • Power down the Primary appliance. This step is required to avoid a split-brain configuration in which you have two Primary appliances.
  • Promote the Secondary appliance.
  • After promoting the Secondary appliance, your system will have one Primary appliance and one "unknown" appliance.
  • Upon reboot, DRBD will automatically set the "unknown" appliance to "secondary".
  • This process is described in the section Failover When the Primary Database Appliance is Inaccessible.

Failover When Both Database Appliances are Accessible

If you need to perform failover, and you can access a shell session on both Database Servers perform the following steps:

  • Log in to the console of the Primary appliance as the em7admin user.
  • Run the following command to assume root user privileges:
  • sudo -s

     

  • When prompted, enter the password for the em7admin user.
  • Run the following command:
  • coro_config

     

    The following prompt appears:

    1) Enable Maintenance

    2) Option Disabled

    3) Demote DRBD

    4) Stop Pacemaker

    5) Resource Status

    6) Quit

    Please enter the number of your choice:

     

  • Enter "3". The following prompt appears:
  • Node currently Primary, would you like to make it Secondary? (y/n) y

     

  • Enter "y". The following output appears:
  • Issuing command: crm_resource --resource ms_drbd_r0 --set-parameter target-role --meta --parameter-value Slave

     

  • The Primary database must be shutdown before promoting the Secondary database. To verify MariaDB is shutdown, check the MariaDB log file: /var/log/mysql/mysql.log. The log file should contain the line: "Shutdown complete".
  • Log in to the console of the Secondary appliance as the em7admin user.
  • Run the following command:
  • sudo -s

     

  • When prompted, enter the password for the em7admin user.
  • Execute the following command:
  • coro_config

     

    The following prompt appears:

    1) Enable Maintenance

    2) Option Disabled

    3) Promote DRBD

    4) Stop Pacemaker

    5) Resource Status

    6) Quit

    Please enter the number of your choice:

     

  • Enter "3". The following prompt appears:
  • Node currently Secondary, would you like to make it Primary? (y/n)

     

  • Enter "y". The following output appears:
  • Issuing command: crm_resource --resource ms_drbd_r0 --set-parameter target-role --meta --parameter-value Master

     

  • To verify that an appliance is active after failover, ScienceLogic recommends checking the status of MariaDB, which is one of the primary processes on Database Servers.To verify the status of MariaDB, execute the following command on the newly promoted Primary Database Server:
  • silo_mysql -e "select 1"

    If MariaDB is running normally, you will see a ‘1’ in the console output.

    TIP: Because larger systems can take more time to start the database, verify that MariaDB has started successfully before running the command in the step above. To verify MariaDB has started successfully, check the MariaDB log file: /var/log/mysql/mysqld.log. The log file should contain the line: "/usr/sbin/mysqld: ready for connections.".

  • If you are using a distributed SL1 system, you must reconfigure all Administration Portals in your system to use the new Database Server. To do this, follow the steps listed in the Reconfiguring Administration Portals section.
  • When the previously Primary Database Server reboots, it will be the Secondary appliance. Upon reboot, DRBD automatically sets all Database Servers to "secondary". This prevents accidental "split-brain" from occurring.

Upon login, if SL1 detects two Primary databases or two Secondary databases, an error message is displayed along with information for how to fix the problem.

Failover When the Primary Database Appliance is Inaccessible

If you need to perform a failover, and you cannot access a shell session on the current Primary Database Server, perform the steps in this section.

To failover when the Primary appliance is inaccessible:

  • Power down the inaccessible Primary Database Server. This step is required to avoid a split-brain configuration where you have two Primary appliances. A split-brain configuration will cause your data to become corrupted.
  • The primary database must be shutdown before promoting the secondary database. To verify MariaDB is shutdown, check the MariaDB log file: /var/log/mysql/mysql.log. The log file should contain the line: "Shutdown complete".
  • Log in to the console of the Secondary appliance as the em7admin user.
  • Run the following command to assume root user privileges:
  • sudo -s

     

  • When prompted, enter the password for the em7admin user.
  • Run the following command:
  • coro_config

     

    The following prompt appears:

    1) Enable Maintenance

    2) Option Disabled

    3) Promote DRBD

    4) Stop Pacemaker

    5) Resource Status

    6) Quit

     

    Please enter the number of your choice:

     

  • Enter "3". The following prompt appears:
  • Node currently Secondary, would you like to make it Primary? (y/n)

     

  • Enter "y". The following output appears:
  • Issuing command: crm_resource --resource ms_drbd_r0 --set-parameter target-role --meta --parameter-value Master

     

  • To verify that an appliance is active after failover, ScienceLogic recommends checking the status of MariaDB, which is one of the primary processes on Database Servers. To verify the status of MariaDB, execute the following command on the newly promoted Database Server:
  • silo_mysql -e "select 1"

    If MariaDB is running normally, you will see a ‘1’ in the console output.

    TIP: Because larger systems can take more time to start the database, verify that MariaDB started successfully before running the above command. To verify MariaDB started successfully, check the MariaDB log file: /var/log/mysql/mysqld.log. The log file should contain the line: "/usr/sbin/mysqld: ready for connections."

  • If you are using a distributed SL1 system, you must reconfigure all Administration Portals in your system to use the new Database Server. To do this, follow the steps listed in the Reconfiguring Administration Portals section.
  • When the previously Primary Database Server reboots, it will be the Secondary appliance. Upon reboot, DRBD automatically sets all Database Servers to "secondary". This prevents accidental "split-brain" from occurring.

Upon login, if SL1 detects two Primary databases or two Secondary databases, an error message is displayed along with information for how to fix the problem.

Reconfiguring Administration Portals

If you are using a Distributed system and you did not configure a virtual IP address, you must configure all Administration Portals in your system to use the new Primary Database Server after performing failover or failback. To configure an Administration Portal to use the new Database Server:

You must perform the following steps in the Web Configuration Utility to configure an Administration Portal:

  1. You can log in to the Web Configuration Utility using any web browser supported by SL1. The address of the Web Configuration Utility is in the following format:

https://<ip-address-of-appliance>:7700

 

Enter the address of the Web Configuration Utility in the address bar of your browser, replacing ip-address-of-appliance with the IP address of the Secondary appliance.

  1. Log in as the "em7admin" user with the password you configured using the Setup Wizard. The Configuration Utility page appears.
  2. Click the Device Settings button. The Settings page appears:

  1. On the Settings page, enter the following:
    • Database IP Address. The IP address of the new Primary ScienceLogic Database Server.
  2. Click the Save button. You may now log out of the Web Configuration Utility.
  3. Repeat these steps for each Administration Portal in your system.

Verifying that a Database Server is Primary

To verify that your network is configured correctly and will allow the newly active Database Server to operate correctly, check the following system functions:

  • If you use Active Directory or LDAP authentication, log in to the user interface using a user account that uses Active Directory or LDAP authentication.
  • In the user interface, verify that new data is being collected.
  • If your system is configured to send notification emails, confirm that emails are being received as expected. To test outbound email, create or update a ticket and ensure that the ticket watchers receive an email.

NOTE: On the Behavior Settings page (System > Settings > Behavior, if the field Automatic Ticketing Emails is set to Disabled, all assignees and watchers will not receive automatic email notifications about any tickets. By default, the field is set to Enabled.

  • If your system is configured to receive emails, confirm that emails are being received correctly. To test inbound email, send a test email that will trigger a "tickets from Email" policy or an "events from Email" policy.

To complete the verification process, execute the following command on the newly demoted secondary database:

sudo systemctl start pacemaker

Failback

If you have performed failover and then want to return to the previous configuration, you can perform failback.

Because DRBD does not allow two Primary appliances, you must first demote the Primary appliance during failback. After demoting, your SL1 system will have two Secondary appliances, but DRBD allows two Secondary appliances. You can then promote the Secondary appliance. After promoting the Secondary appliance, your SL1 system will have one Primary appliance and one Secondary appliance.

Upon login, if SL1 detects two Primary databases or two Secondary databases, an error message is displayed along with information for how to fix the problem.

To perform failback:

  • Log in to the console of the current Primary appliance as the em7admin user.

  • Check the status of both appliances. To do this, enter the following at the shell prompt:

    cat /proc/drbd

     

    Your output will look like this:

    1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

    ns:17567744 al:0 bm:1072 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:12521012\

     

    To failback safely, the output should include "ro:Primary/Secondary ds:UpToDate/UptoDate".

If your two appliances cannot communicate, your output will include "ro:Primary/Unknown ds:UpToDate/DUnknown". Before proceeding with failback, troubleshoot and resolve the communication problem.

If your output includes "ro:Primary/Secondary", but does not include "UpToDate/UpToDate", data is being synchronized between the two appliances. You must wait until data synchronization has finished before performing failback.