Using SL1 to Monitor SL1 PowerFlow

Download this manual as a PDF file 

This section describes the various ScienceLogic PowerPacks that you can use to monitor the components of the PowerFlow system. This section also describes the suggested settings, metrics, and situations for healthy SL1 and PowerFlow systems.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all of the menu options, click the Advanced menu icon ().

Monitoring PowerFlow

You can use a number of ScienceLogic PowerPacks to help you monitor the health of your PowerFlow system. This section describes those PowerPacks and additional resources and procedures you can use to monitor the components of PowerFlow.

You can also use the PowerFlow Control Tower page in the PowerFlow user interface to monitor the status of the various tasks, workers, and applications that are running on your PowerFlow system. You can use this information to quickly determine if your PowerFlow instance is performing as expected.

You can download the following PowerPacks from the PowerPacks page of the ScienceLogic Support Site at https://support.sciencelogic.com/s/ to help you monitor your PowerFlow system:

  • Linux Base Pack PowerPack: This PowerPack monitors your Linux-based PowerFlow server with SSH (the PowerFlow ISO is built on top of an Oracle Linux Operating System). This PowerPack provides key performance indicators about how your PowerFlow server is performing. The only configuration you need to do with this PowerPack is to install the latest version of it.
  • Docker PowerPack: This PowerPack monitors the various Docker containers, services, and Swarm that manage the PowerFlow containers. This PowerPack also monitors PowerFlow when it is configured for High Availability. Use version 103 or later of the Docker PowerPack to monitor PowerFlow services in SL1. For more information, see Configuring the Docker PowerPack.
  • SL1 PowerFlow PowerPack . This PowerPack monitors the status of the applications in your PowerFlow system. Based on the events generated by this PowerPack, you can diagnose why applications failed on PowerFlow. For more information, see Configuring the SL1 PowerFlow PowerPack.

    Versions 105 and earlier of this PowerPack were named the ScienceLogic: Integration Service PowerPack.

  • Couchbase PowerPack: This PowerPack monitors the Couchbase database that PowerFlow uses for storing the cache and various configuration and application data. This data provides insight into the health of the databases and the Couchbase servers. For more information, see Configuring the Couchbase PowerPack.
  • AMQP: RabbitMQ PowerPack. This PowerPack monitors RabbitMQ configuration data and performance metrics using Dynamic Applications. You can use this PowerPack to monitor the RabbitMQ service used by PowerFlow. For more information, see Configuring the RabbitMQ PowerPack.

You can use each of the PowerPacks listed above to monitor different aspects of PowerFlow. Be sure to download and install the latest version of each PowerPack.

The following sub-topics describe the configuration steps you need to take for each PowerPack. For best results, complete these configuration steps in the given order to set up monitoring of PowerFlow within SL1.

Configuring the Docker PowerPack

The Docker PowerPack monitors the various Docker containers, services, and Swarm that manage the PowerFlow containers. This PowerPack also monitors PowerFlow when it is configured for High Availability. Use version 103 or later of the Docker PowerPack to monitor PowerFlow services in SL1.

To configure the Docker PowerPack to monitor PowerFlow:

  1. Make sure that you have already installed the Linux Base Pack PowerPack and the Docker PowerPack.
  2. In SL1, go to the Credential Management page (Manage > Credentials or System > Manage > Credentials in the classic user interface) and click to edit the Docker Basic - Dev ssh credential. The Edit Credential page appears.
  3. Complete the following fields, and keep the other fields at their default settings:

  • Name. Type a new name for the credential.
  • Hostname/IP. Type the hostname or IP address for the PowerFlow instance, or type "%D".
  • Username. Type the username for the PowerFlow instance.
  • Password. Type the password for the PowerFlow instance.

  1. Click Save & Close.

  2. On the Devices page, click Add Devices to discover your PowerFlow server using the new Docker SSH new credential.

    Use the Unguided Network Discovery option and search for the new Docker credential on the Choose credentials page of the Discovery wizard. For more information, see the Adding Devices Using Guided Discovery in the Discovery section.

    Select Discover Non-SNMP and Model Devices in the Advanced options section.

    After the discovery is complete, SL1 creates a new Device record for the PowerFlow server and new Device Component records for Docker containers.

  1. Go to the Devices page and select the new device representing your PowerFlow server.
  1. Go to the Collections tab of the Device Investigator page for the new device and make sure that all of the Docker and Linux Dynamic Applications have automatically aligned. This process usually takes a few minutes. A group of Docker and Linux Dynamic Applications should now appear on the Collections tab:

  1. To view your newly discovered device components, navigate to the Device Components page (Devices > Device Components). If you do not see your newly discovered Docker Host, wait for the dynamic applications on the Docker host to finish modeling out its component devices. A Docker Swarm virtual root device will also be discovered. After discovery finishes, you should see the following devices representing your PowerFlow system on the Device Components page:

If the Docker Swarm root device is modeled with a different device class, go to the Devices page and select the Docker Swarm root device. Click the Edit button on the Device Investigator page , click the Info drop-down, and edit the Device Class field. From the Select a Device Class window, select ScienceLogic | Integration Service as the Device Class and click Set Class. Click Save on the Device Investigator page to save your changes.

At times, the advertised host IP for a Docker node might display as "0.0.0.0" instead of the actual external address. This is a known issue in Docker. To work around this issue, remove and rejoin the nodes of the swarm one by one, and use the following argument to add them: --advertise-addr <ip-to-show>. For example, docker swarm join --advertise-addr .... Do not remove a leader node unless there are at least two active leaders available to take its place.

Configuring the SL1 PowerFlow PowerPack

Versions 105 and earlier of this PowerPack were named the ScienceLogic: Integration Service PowerPack.

The SL1 PowerFlow PowerPack monitors the status of the applications in your PowerFlow system. Based on the events generated by this PowerPack, you can diagnose why applications failed in PowerFlow.

To configure SL1 to monitor PowerFlow, you must first create a SOAP/XML credential. This credential allows the Dynamic Applications in the PowerFlow PowerPack to communicate with PowerFlow.

In addition, before you can run the Dynamic Applications in the PowerFlow PowerPack, you must manually align the Dynamic Applications from this PowerPack to your PowerFlow device in SL1. These steps are covered in detail below.

Configuring the PowerPack

To configure the PowerFlow PowerPack:

  1. In SL1, make sure that you have already installed the Linux Base PowerPack, the Docker PowerPack, and the SL1 PowerFlow PowerPack on your SL1 system.
  2. In SL1, navigate to the Credentials page (Manage > Credentials or System > Manage > Credentials in the classic user interface) and select the ScienceLogicPowerFlow Example credential. The Edit Credential modal appears.
  1. Complete the following fields, and keep the other fields at their default settings:
  • Profile Name. Type a new name for the credential.
  • URL. Type the URL for your PowerFlow system.
  • HTTP Auth User. Type the PowerFlow administrator username.
  • HTTP Auth Password. Type the PowerFlow administrator password
  • Embed Value [%1]. Type "False".
  1. Click the Save & Close button. You will use this new credential to manually align the following Dynamic Applications:
  • REST: Performance Metrics Monitor
  • ScienceLogic: PowerFlow Queue Configuration
  • ScienceLogic: PowerFlow Workers Configuration

  1. Go to the Devices page, select the device representing your PowerFlow server, and click the Collections tab.
  2. Click Edit, click Align Dynamic App, and select Choose Dynamic Application. The Choose Dynamic Application window appears.
  3. In the Search field, type the name of the first of the PowerFlow Dynamic Applications. Select the Dynamic Application and click Select.
  4. Select Choose Dynamic Application. The Choose Credential window appears.
  5. In the Search field, type the name of the credential you created in steps 2-4, select the new credential, and click Select. The Align Dynamic Application window appears.
  6. Click Align Dynamic App. The Dynamic Application is added to the Collections tab.
  7. Repeat steps 6-10 for each remaining Dynamic Application for this PowerPack, and click Save when you are done aligning Dynamic Applications.

Events Generated by the PowerPack

The "ScienceLogic: Integration Service Queue Configuration" Dynamic Application generates a Major event in SL1 if an application fails in PowerFlow:

The related Event Policy includes the name of the application, the Task ID, and the traceback of the failure. You can use the application name to identify the application that failed in PowerFlow. You can use the Task ID to determine the exact execution of the application that failed, which you can then use for debugging purposes.

To view more information about the execution of an application in PowerFlow, navigate to the relevant page in PowerFlow by formatting the URL in the following manner:

https://<PowerFlow_hostname>/integrations/<application_name>?runid=<task_id>

For example:

https://192.0.2.0/integrations/sync_credentials?runid=c7e157ae-5644-4161-a241-59516feeadec

Configuring the Couchbase PowerPack

Couchbase stores all cache and configuration data on PowerFlow. Monitoring the performance of your PowerFlow is critical in ensuring the health of your PowerFlow instance.

After you install the Couchbase PowerPack in SL1, create a new Couchbase SOAP/XML credential. Using that credential, you need to manually align the "Couchbase Component Count" and "Couchbase Pool Discovery" Dynamic Applications with the Docker Swarm root device. These steps are covered in detail below.

To configure the Couchbase PowerPack:

  1. In SL1, navigate to the Credentials page (Manage > Credentials or System > Manage > Credentials in the classic user interface) and select the Couchbase Sample credential. The Edit Credential modal appears.
  2. Complete the following fields, and keep the other fields at their default settings:

  • Name. Type a new name for the credential.
  • URL. Type the full URL for PowerFlow. Ensure that the port 8091 is appended to the hostname, and use https for this URL.
  • HTTP Auth User. Type the username for your PowerFlow instance.
  • HTTP Auth Password. Type the password for your PowerFlow instance.

    For a clustered PowerFlow environment, point the Couchbase credentials at the load balancer for PowerFlow. The example above is for a single node deployment.

  1. Click Save & Close.
  2. Go to the Device Components page (Devices > Device Components) and expand the Docker Swarm root device by clicking the + icon.
  3. Click the wrench icon () for the “Stack | Docker Stack” component device (iservices) and click the Collections tab.

  1. Align the "Couchbase: Pool Discovery" Dynamic Application by clicking the Actions button and selecting Add Dynamic Application. The Dynamic Application Alignment modal appears:

  1. Select the "Couchbase: Pool Discovery" Dynamic Application and select the Couchbase credential that you created in steps 1-3. Click Save.
  2. Click the Actions button, select Add Dynamic Application and align the "Couchbase: Component Count" Dynamic Application and the Couchbase credential. Click Save.
  3. Select the "Couchbase: Component Count" Dynamic Application and select the IS credential that you created in steps 2-3. Click Save. SL1 models out your Couchbase components and provides you with additional information about the usage of the Couchbase service.

  1. Navigate to the Device Components page (Devices > Device Components) to see the Couchbase components:

Configuring the RabbitMQ PowerPack

You can monitor the RabbitMQ service with the AMQP: RabbitMQ PowerPack. This PowerPack monitors RabbitMQ configuration data and performance metrics using Dynamic Applications, and the PowerPack creates a major event in SL1 for any applications in PowerFlow that are in a Failed state.

After you install the RabbitMQ PowerPack in SL1, create a new SOAP/XML credential. Using that credential, you need to manually align the following Dynamic Applications:

  • ScienceLogic: PowerFlow Queue Configuration
  • AMQP: RabbitMQ Configuration
  • AMQP: RabbitMQ Performance

To configure the RabbitMQ PowerPack:

  1. In SL1, navigate to the Credentials page (Manage > Credentials or System > Manage > Credentials in the classic user interface) and duplicate the ScienceLogicPowerFlow Example credential. You can also create a new SOAP/XML credential. The Edit Credential modal appears.

  1. Complete the following fields, and keep the other fields at their default settings:
  • Name. Type a new descriptive name for the credential.

  • URL. Type the full URL for your PowerFlow and use https for this URL.

  • HTTP Auth User. Type the username for your PowerFlow instance.

  • HTTP Auth Password. Type the password for your PowerFlow instance.

    For a clustered PowerFlow environment, point the credentials at the load balancer for the PowerFlow system. The example above is for a single node deployment.

  1. Click Save & Close.
  2. Go to the Device Components page (Devices > Device Components) and click the wrench icon () for the Docker Swarm root device.
  3. Click the Collections tab on the Device Properties window.
  4. Align the "ScienceLogic: Integration Service Queue Configuration" Dynamic Application by clicking the Actions and selecting Add Dynamic Application. The Dynamic Application Alignment modal appears.
  5. Select the "ScienceLogic: Integration Service Queue Configuration" Dynamic Application and select the credential that you created in step 2. Click Save. This Dynamic Application queries PowerFlow every 15 minutes by default to retrieve information about any failed integrations, which generates a Major event in SL1 (the events auto-expire after 90 minutes):

    The events generated by this Dynamic Application include the Integration ID, which you can use to find the relevant application on your PowerFlow instance. Copy the name in the event message and navigate to https://<PowerFlow>/integrations/<integration_ID>.

  1. To view more information about your failed applications, navigate to the Configurations tab for the device and click the report for the "ScienceLogic: Integration Service Queue Configuration" Dynamic Application. This configuration report shows you more information about the failed integrations on your Integration Service instance. For example, you can use the Last Run ID field to find the exact logs for a specific execution of the application. To do this, copy the Integration ID and the Last Run ID and navigate to https://<PowerFlow>/integrations/<integration_ID>?runid=<last_run_id>
  2. Align the "AMQP: RabbitMQ Configuration" and "AMQP: RabbitMQ Performance" Dynamic Applications using the same process as steps 6-7.

Stability of the PowerFlow Platform

This topic defines what a healthy SL1 system and a healthy PowerFlow system look like, based on the following settings, metrics, and situations.

What makes up a healthy SL1 system?

To ensure the stability of your SL1 system, review the following settings in your SL1 environment:

  • The SL1 system has been patched to a version that has been released by ScienceLogic within the last 12 months. ScienceLogic issues a software update at least quarterly. It is important for the security and stability of the system that customers regularly consume these software updates.
  • The user interface and API response times for standard requests are within five seconds:
  • Response time for a specific user interface request.
  • Response time for a specific API request.
  • At least 20% of local storage is free and available for new data. Free space is a combination of unused available space within InnoDB datafiles and filesystem area into which those files can grow
  • The central system is keeping up with all collection processing:
  • Performance data stored and available centrally within three minutes of collection
  • Event data stored and available centrally within 30 seconds of collection
  • Run book automations are completing normally
  • Collection is completing normally. Collection tasks are completing without early termination (sigterm).
  • All periodic maintenance tasks are completing successfully:
  • Successfully completing daily maintenance (pruning) on schedule
  • Successfully completing backup on schedule
  • High Availability and Disaster Recovery are synchronized (where used):
  • Replication synchronized (except when halted / recovering from DR backup).
  • Configuration matches between nodes.

What makes up a healthy PowerFlow system?

To ensure the stability of the PowerFlow system, review the following settings in your environment:

  • The settings from the previous list are being met in your SL1 system.
  • You are running a supported version of PowerFlow.
  • The memory and CPU percentage of the host remains less than 80% on core nodes.
  • Task workloads can be accepted by the API and placed onto the queues for execution.
  • The PowerFlow API is responding to POST calls to run applications within the default timeout of 30 seconds. For standard applications triggers, this is usually sub-second.
  • The PowerFlow Scheduler is configured correctly. For example, there are no tasks accidentally set to run every minute or every second.
  • Task workloads are actively being pulled from queues for execution by workers. Workers are actively processing tasks, and not just leaving items in queue.
  • Worker nodes are all up and available to process tasks.
  • Couchbase does not frequently read documents from disk. You can check this value with the “Disk Fetches per second” metric in the Couchbase user interface.
  • The Couchbase Memory Data service memory usage is not using all allocated memory, forcing data writes to disk. You can check this value with the "Data service memory allocation" metric in the main Couchbase dashboard.
  • Container services are not restarting.
  • The RabbitMQ memory usage is not more than 2-3 GB per 10.000 messages in queues. The memory usage might be a little larger if you are running considerably larger tasks.
  • RabbitMQ mirrors are synchronized.
  • RabbitMQ is only mirroring the dedicated queues, not temporary or TTL queues.
  • All Couchbase indexes are populated on all Couchbase nodes.
  • The Couchbase nodes are fully rebalanced and distributed.
  • The Docker Swarm cluster has at least three active managers in a High Availability cluster.
  • For any Swarm node that is also a swarm manager, and that node is running PowerFlow services :
  • At least one CPU with 4 GB of memory is available on the host to actively manage the swarm cluster.
  • Any PowerFlow services running on this host are not able to consume all of the available resources, causing cluster operations to fail.

Some of the following PowerFlow settings might vary, based on your configuration:

  • The number of applications sitting in queue is manageable. A large number of applications sitting in queue could indicate either a large spike in workload, or no workers are processing.
  • The number of failed tasks is manageable. A large number of failed tasks could be caused by ServiceNow timeouts, expected failure conditions, and other situations.
  • ServiceNow is not overloaded with custom table transformations that cause long delays when PowerFlow is communicating with ServiceNow.