Troubleshooting SL1 PowerFlow

Download this manual as a PDF file 

This section includes troubleshooting resources, procedures, and frequently asked questions related to working with SL1 PowerFlow.

PowerFlow and SyncPacks content not created by ScienceLogic is not supported by ScienceLogic. This includes custom steps, PowerFlow applications, and SyncPacks.

Initial Troubleshooting Steps

PowerFlow acts as a middle server between data platforms. For this reason, the first steps should always be to ensure that there are no issues with the data platforms with which PowerFlow is talking. There might be additional configurations or actions enabled on ServiceNow or SL1 that result in unexpected behavior. For detailed information about how to perform the steps below, see Resources for Troubleshooting.

SL1 PowerFlow

  1. Run docker service ls on the PowerFlow server:
  • Note the Docker container version.
  • Verify that the Docker services are running.
  1. If a certain service is failing, make a note of the service name and version.
  2. If a certain service is failing, run docker service ps <service_name> to see the historical state of the service and make a note of this information. For example: docker service ps iservices_contentapi.
  3. Make a note of any logs impacting the service by running docker service logs <service_name>. For example: docker service logs iservices_couchbase.

ServiceNow

  1. Make a note of the ServiceNow version and SyncPack version, if applicable.
  2. Make a note if you are running a ServiceNow certified application or a Service Graph SyncPack.
  3. Make a note of the SyncPack application that is failing in PowerFlow.
  4. Make a note of what step is failing in the application, try running the application in debug mode, and capture any traceback or error messages that occur in the step log.

Resources for Troubleshooting

This section contains port information for PowerFlow and troubleshooting commands for Docker, Couchbase, and the PowerFlow API.

Useful PowerFlow Ports

  • https://<IP of PowerFlow>:8091. Provides access to Couchbase, a NoSQL database for storage and data retrieval.
  • https://<IP of PowerFlow>:15672. Provides access to the RabbitMQ Dashboard, which you can use to monitor the service that distributes tasks to be executed by PowerFlow workers.
  • https://<IP of PowerFlow>/flower/dashboard. Provides access to Flower, a tool for monitoring and administrating Celery clusters.
  • https://<IP of PowerFlow>:3141. Provides access to the pypiserver service. which you can use to see if SyncPacks have been correctly uploaded to Devpi container.

Port 5556 must be open for both PowerFlow and the client.

powerflowcontrol healthcheck and autoheal actions

SL1 PowerFlow includes a command-line utility called powerflowcontrol (pfctl) that performs multiple administrator-level actions in a clustered PowerFlow environment. The powerflowcontrol utility contains automatic cluster healthcheck and autoheal actions that you can use to prevent issues with your PowerFlow environment.

For more information, see Using the powerflowcontrol (pfctl) Command-line Utility.

Helpful Docker Commands

PowerFlow is a set of services that are containerized using Docker. For more information about Docker, see the Docker tutorial.

Use the following Docker commands for troubleshooting and diagnosing issues with PowerFlow:

Viewing Container Versions and Status

To view the PowerFlow version, SSH to your instance and run the following command:

rpm -qa | grep powerflow

To view the individual services with their respective image versions, SSH to your PowerFlow instance and run the following command:

docker service ls

In the results, you can see the container ID, name, mode, status (see the replicas column), and version (see the image column) for all the services that make up PowerFlow:

Restarting a Service

Run the following command to restart a single service:

docker service update --force <service_name>

Stopping all PowerFlow Services

Run the following command to stop all PowerFlow services:

docker stack rm iservices

Restarting Docker

Run the following command to restart Docker:

systemctl restart docker

Restarting Docker does not clear the queue.

Viewing Logs for a Specific Service

You can use the Docker command line to view the logs of any current running service in the PowerFlow cluster. To view the logs of any service, run the following command:

docker service logs -f iservices_<service_name>

Some common examples include the following:

docker service logs –f iservices_couchbase

docker service logs –f iservices_steprunner

docker service logs –f iservices_contentapi

Application logs are stored on the central database as well as on all of the Docker hosts in a clustered environment. These logs are stored at /var/log/iservices for both single-node or clustered environments. However, the logs on each Docker host only relate to the services running on that host. For this reason, using the Docker service logs is the best way to get logs from all hosts at once.

The application logs stored locally on individual node filesystems can now be collected using the powerflowcontrol (pfctl) command-line utility collect_pf_logs action. For more information, see collect_pf_logs.

Clearing RabbitMQ Volume

RabbitMQ is a service that distributes tasks to be executed by PowerFlow workers. This section covers how to handle potential issues with RabbitMQ.

The following error message might appear if you try to run a PowerFlow application via the API:

Internal error occurred: Traceback (most recent call last):\n File \"./content_api.py\", line 199,
in kickoff_application\n task_status = ... line 623, in _on_close\n (class_id, method_id),
ConnectionError)\nInternalError: Connection.open: (541) INTERNAL_ERROR - access to vhost '/'
refused for user 'guest': vhost '/' is down

If your PowerFlow system is in a cluster, in this situation you should scale down services, clear the volumes, and then scale them back up one by one. If your PowerFlow is a single node, you would just have on node and only iservices_rabbitmq.

To clear the RabbitMQ volume:

  1. Run the following command:

    docker service scale iservices_rabbitmq=0 iservices_rabbitmq2=0 iservices_rabbitmq3=0

  2. Find and clear all RabbitMQ-related volumes from a node:

    docker volume rm $(docker volume ls -q --filter name="i*rabbit*")

  3. Scale back up one node at a time:

    docker service scale iservices_rabbitmq=1

  4. Wait for the first RabbitMQ node to come up and connect to workers.

  5. Run the following command:

    docker service scale iservices_rabbitmq2=1

  6. Then run the following command:

    docker service scale iservices_rabbitmq3=1

If you get a message stating that the volume is in use, run the following command:

docker rm <id of container using volume>

Re-deploy PowerFlow by running the following command:

docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices

Restarting Docker does not clear the queue, because the queue is persistent. However, clearing the queue with the commands above might result in data loss due to the tasks being removed from the queue.

Viewing the Process Status of All Services

Run the following command:

docker ps

Deploying Services from a Defined Docker Compose File

Run the following command:

docker stack deploy -c <compose-file> iservices

Dynamically Scaling for More Workers

Run the following command:

docker service scale iservices_steprunner=10

Completely Removing Services from Running

Run the following command:

docker stack rm iservices

Helpful Couchbase Commands

Checking the Couchbase Cache to Ensure an SL1 Device ID is Linked to a ServiceNow Sys ID

You can determine how an SL1 device links to a ServiceNow CI record by using the respective device and sys IDs. You can retrieve these IDs from the PowerFlow Couchbase service.

First, locate the correlation ID with the following Couchbase query:

select meta().id from logs where meta().id like "lookup%"

This query returns results similar to the following:

[

{

"id": "lookup-ScienceLogicRegion+DEV+16"

},

{

"id": "lookup-ScienceLogicRegion+DEV+17"

}

]

After you locate the correlation ID, run the following query:

select cache_data from logs where meta().id = "lookup-ScienceLogicRegion+DEV+16"

This query returns the following results:

[

{

"cache_data": {

"company": "d6406d3bdbc72300c40bdec0cf9619c2",

"domain": null,

"snow_ci": "u_cmdb_ci_aws_service",

"sys_id": "0c018f14dbd36300f3ac70adbf9619f7"

}

}

]

Clearing the Internal PowerFlow Cache

Some applications pull information from ScienceLogic or ServiceNow and store that data in a cache stored on PowerFlow. Examples of these applications include "Cache ScienceLogic Devices using GraphQL" and "Cache ServiceNow CIs and ScienceLogic Device Classes". While not often, at time these caches could get stale and as a result, they might not get auto-cleared or updated in PowerFlow. The steps below cover how to clear the cache to force PowerFlow to re-build that cache.

Clearing the internal PowerFlow cache can be done using the command-line interface or the Couchbase user interface.

Clearing the Cache using the Command-Line Interface

  1. To clear the cache using the command-line interface, you will need to first get the ID of the Couchbase container using the docker ps or the docker service ls command. Use this ID for the <container_id> value in the next step.

  2. Run the following commands on the PowerFlow system:

    docker exec -it <container_id> /bin/bash

    cbq -u <username> -p <password> -e "http://<IP_of_PowerFlow>:8091"

    delete from logs where log_type == "cache"

    Alternately, you can run the following command instead:

    docker exec -i -t $(docker ps --filter name=iservices_couchbase -q) /bin/bash

Clearing the Cache using the Couchbase user interface

  1. Navigate to http://<IP_of_PowerFlow>:8091 and sign in using your regular PowerFlow credentials.

  2. Go to the Query tab and run the following query:

    delete from logs where log_type == "cache"

Accessing Couchbase with the Command-line Interface

If you don't have access to port 8091 on your PowerFlow instance, you can connect to the Couchbase container by using the command-line interface (CLI).

To access Couchbase by using the CLI, run the following commands:

docker exec -it <container_id> /bin/bash

cbq -u <username> -p <password> -e "https://<localhost>:8091"

Temporarily Exposing Couchbase Ports for Troubleshooting

To troubleshoot issues with Couchbase, you can temporarily expose Couchbase ports without requiring Dex authentication.

  1. Update the docker-compose file by adding the FORWARD_COUCHBASE: 'true' environment variable to the gui service and exposing ports 8102-8104 for the gui service.

  2. Run the following command to remove the gui service:

    docker service rm iservices_gui

  3. Redeploy the stack so the gui service can be recreated. The Couchbase nodes will be exposed by default in ports 8102-8104.

  4. If custom ports need to be configured for exposing Couchbase nodes, the following environment variables can be configured:

    • COUCHBASE_PRIMARY_PORT
    • COUCHBASE_WORKER_PORT
    • COUCHBASE_WORKER_2_PORT

Disable this configuration after you finish the troubleshooting process, as exposing services without going through the PowerFlow login process is not secure.

You can also expose secondary Couchbase user interfaces for PowerFlow versions 2.3 through 2.5, using Dex authentication:

  1. In the isconfig.yml file, update the following line:

    OPEN_SECONDARY_CB_PORTS: true

  2. Update the docker-compose.yml file with the following GUI ports:

    - published: 8100

    target: 8100

    mode: host

    - published: 8101

    target: 8101

    mode: host

  3. If a load balancer is being used, update the load balancer configuration to route to 8100, 8101, and open the port in the load balancer.

  4. Re-deploy the stack, and you will now see that couchbase-worker and couhcbase-worker2 are accessible via 8100 and 8103, respectively.

To expose secondary Couchbase user interfaces for PowerFlow version 2.6.0 and later, with or without Dex authentication:

  1. Update the isconfig.yml file and set the FORWARD_COUCHBASE: true to expose Couchbase ports without using Dex authentication:

    • COUCHBASE_PRIMARY_PORT 8102

    • COUCHBASE_WORKER_PORT default 8103

    • COUCHBASE_WORKER_2_PORT default 8104

  2. Update the docker-compose.yml file to expose the ports through the GUI service:

    - published: 8102

    target: 8102

    mode: host

    - published: 8103

    target: 8103

    mode: host

    - published: 8104

    target: 8104

    mode: host

  3. Navigate to the PowerFlow host IP address directly, using one of the above ports for direct access to CB without going through Dex authentication.

Temporarily Exposing RabbitMQ Ports for Troubleshooting

You can also temporarily expose RabbitMQ ports without requiring Dex authentication.

  1. Update the docker-compose file by adding the FORWARD_RABBITMQ: 'true' environment variable to the gui service and exposing port 15693 for the gui service.

  2. Run the following command to remove the gui service:

    docker service rm iservices_gui

  3. Redeploy the stack so the gui service can be recreated. The RabbitMQ nodes will be exposed by default in port 15693 .

  4. If custom ports need to be configured for exposing Couchbase nodes, you can use the RABBITMQ_ PORT environment variable.

Disable this configuration after you finish the troubleshooting process, as exposing services without going through the PowerFlow login process is not secure.

Useful API Commands

Getting PowerFlow Applications from the PowerFlow API

You can use the API or cURL to retrieve the application code, which is useful when you are troubleshooting potential code-related issues. You cannot access these API endpoints with a browser, but you can request these API endpoints by using an application such as Postman:

https://<PowerFlow>/api/v1/applications/<application_name>

If you do not have access to Postman, you can use cURL to get the same information.

curl -iku <username>:<password> -H "Accept: application/json" -H "Content-Type: application/json" -X GET https://<PowerFlow>/api/v1/applications/<application_name>

Creating and Retrieving Schedules with the PowerFlow API

You can define and retrieve schedules using the PowerFlow API. You can define all of these schedules in the PowerFlow user interface as well.

To create a schedule via the API, POST the following payload to the API endpoint:

https://<PowerFlow>/api/v1/schedule

 

{

"application_id": "APP_ID",

"entry_id": "SCHEDULE_NAME",

"params": {"a":"B"},

"schedule": {

"schedule_info": {

"day_of_month": "*",

"day_of_week": "*",

"hour": "*",

"minute": "*",

"month_of_year": "*"

},

"schedule_type": "crontab"

},

"total_runs": 0

}

You can also specify the schedule to run on a frequency in seconds by replacing the schedule portion with the following:

"schedule": {

"schedule_info": {

"run_every": FREQUENCY_IN_SECONDS

},

"schedule_type": "frequency"

}

Diagnosis Tools

Multiple diagnosis tools exist to assist in troubleshooting issues with the PowerFlow platform:

  • Docker PowerPack. This PowerPack monitors your Linux-based PowerFlow server with SSH (the PowerFlow ISO is built on top of an Oracle Linux Operating System). This PowerPack provides key performance indicators about how your PowerFlow server is performing. For more information on the Docker PowerPack and other PowerPacks that you can use to monitor PowerFlow, see the Using SL1 to Monitor SL1 PowerFlow.
  • Flower. This web interface tool can be found at the /flower endpoint. It provides a dashboard displaying the number of tasks in various states as well as an overview of the state of each worker. This tool shows the current number of active, processed, failed, succeeded, and retried tasks on the PowerFlow platform. This tool also shows detailed information about each of the tasks that have been executed on the platform. This data includes the UUID, the state, the arguments that were passed to it, as well as the worker and the time of execution. Flower also provides a performance chart that shows the number of tasks running on each individual worker.

  • Debug Mode. All applications can be run in "debug" mode via the PowerFlow API. Running applications in debug mode may slow down the platform, but they will result in much more detailed logging information that is helpful for troubleshooting issues. For more information on running applications in Debug Mode, see Retrieving Additional Debug Information.

  • Application Logs. All applications generate a log file specific to that application. These log files can be found at /var/log/iservices and each log file will match the ID of the application. These log files combine all the log messages of all previous runs of an application up to a certain point. These log files roll over and will get auto-cleared after a certain point.

  • Step Logs. Step logs display the log output for a specific step in the application. These step logs can be accessed via the PowerFlow user interface by clicking on a step in an application and bringing up the Step Log tab. These step logs display just the log output for the latest run of that step.

  • Service Logs. Each Docker service has its own log. These can be accessed via SSH by running the following command:

    docker service logs -f <service_name>

Identifying Why a Service or Container Failed

This section outlines the troubleshooting steps necessary to determine the underlying root cause of why a service or container was restarted. For this section, we use the iservices_redis service as an example.

Step 1: Obtain the ID of the failed container for the service

Run the following command for the service that failed previously:

docker service ps --no-trunc <servicename>

For example:

docker service ps --no-trunc iservices_redis

From the command result above, we see that one container with the id 3s7s86n45skf had failed previously while running on node is-scale-03, with the error "non-zero exit", and another container was restarted in its place.

At this point, we can ask the following questions:

  • When you run docker service ps --no-trunc, is the error something obvious? Does the error say that it cannot mount a volume, or that the image is not found? If so, that's most likely the root cause of the issue and what needs to be addressed
  • Did the node on which that container was running go down? Or is that node still up?
  • Are the other services running on that node running fine? Was only this single service affected?
  • If other services are running fine on that same node, it is probably a problem with the service itself. If all services on that node are not functional, it could mean a node failure.

At this point, we should be confident that the cause of the issue is not a deploy configuration issue, it is not an entire node failure, and the problem exists within the service itself. Continue to Step 2 if this is the case.

Step 2: Check for any error messages or logs indicating an error

Using the id obtained from step 1 we can collect the logs from the failed container with the following commands:

docker service logs <failed-id>

For example:

docker service logs 3s7s86n45skf

Search the service logs for any explicit errors or warning messages that might indicate why the failure occurred.

Usually, you can find the error message in those logs, but if the container ran out of memory, it may not be seen here. Continue to Step 3 if the logs provide nothing fruitful.

Step 3: Check for out of memory events

If there were no errors in the logs, or anywhere else that can be seen, a possible cause for a container restart could be that the system ran out of memory.

Perform the following steps to identify if this is the case:

  1. Log in to the node where the container failed in our example. As seen in step 1, the container failed on is-scale-03.

  2. From the node where the container failed, run the following command:

    journalctl -k | grep -i -e memory -e oom

  1. Check the result for any out of memory events that caused the container to stop. Such an event typically looks like the following:

    is-scale-03 kernel: Out of memory: Kill process 5946 (redis-server) score 575 or sacrifice child

Troubleshooting a Cloud Deployment of PowerFlow

After completing the AWS setup instructions, if none of the services start and you see the following error during troubleshooting, you will need to restart Docker after installing the RPM installation.

sudo docker service ps iservices_couchbase --no-trunc

"error creating external connectivity network: Failed to Setup IP tables: Unable to enable SKIP DNAT rule: (iptables failed: iptables --wait -t nat -I DOCKER -i docker_gwbridge -j RETURN: iptables: No chain/target/match by that name."

Identifying Why a PowerFlow Application Failed

The following topics describe how to determine where a PowerFlow application failed, and how to retrieve more log information related to that failure from the logs.

Determining Where an Application Failed

If a PowerFlow application fails, a red failure icon appears under that application on the Application detail page.

To determine where the application is failing:

  1. Open the application and locate which step is failing. A failed step is highlighted in red in the image above.
  2. Select the step and click the Step Log tab to view the logs for that step.
  3. Review the error message to determine the next steps.

Retrieving Additional Debug Information (Debug Mode)

The logs in PowerFlow use the following loglevel settings, from most verbose to least verbose:

  • 10. Debug Mode.
  • 20. Informational.
  • 30. Warning. This is the default settings if you do not specify a loglevel.
  • 40. Error.

If you run applications in Debug Mode ("loglevel": 10), those applications will take longer to run because of increased I/O requirements. Enabling debug logging using the following process is the only recommended method. ScienceLogic does not recommend setting "loglevel": 10 for the whole stack with the docker-compose file.

To run an application in Debug Mode using the PowerFlow user interface:

  1. Select the PowerFlow application from the Applications page.
  2. Hover over the Run button and select Custom Run from the pop-up menu. The Custom Run window appears.
  3. Select the Logging Level. Debug is the most verbose and will take longer to run.
  4. Specify the configuration object for the custom run in the Configuration field, and add any JSON parameters in the Custom Parameters field, if needed.
  5. Click Run.

To run an application in Debug Mode using the API:

  1. POST the following to the API endpoint:

    https://<PowerFlow_IP>/api/v1/applications/run

  2. Include the following in the request body:

    {
      "name": "<application_name>",
      "params": {
      "loglevel": 10
      }
    }

After running the application in Debug Mode, review the step logs in the PowerFlow user interface to see detailed debug output for each step in the application. This information is especially helpful when trying to understand why an application or step failed:

You can also run an application in debug using curl via SSH:

  1. SSH to the PowerFlow instance.

  2. Run the following command:

    curl -v -k -u isadmin:<password> -X POST "https://<your_hostname>/api/v1/applications/run"
    -H 'Content-Type: application/json' -H 'cache-control: no-cache' -d '{"name":
    "interface_sync_sciencelogic_to_servicenow","params": {"loglevel": 10}}'

Troubleshooting Clustering and Node Failover

This section covers how to troubleshoot a clustered environment and what to expect with a node failover in a PowerFlow system.

After a failover, Couchbase or the PowerFlow user interface are not available

In this situation, a failover occurred in a three-node cluster. Docker was stopped on the Swarm Leader, and another node in the cluster took the Swarm Leader role. After that, you could not access the PowerFlow user interface, and Couchbase was not running.

The reason that the master of Couchbase on that node was not running was that the node was down. The other two Couchbase systems in the cluster were still running, and you should have been able to access the system through the PowerFlow user interface.

If you could not access the PowerFlow user interface, check to make sure you have a load balancer in front of the cluster. Each instance of Couchbase is "pinned" to a specific node so that it can persist data to that local node. The locally persisted data is then replicated between all three Couchbase nodes.

After a cluster or node failover, PowerFlow will not start

In this situation, you had a cluster or node failover, and now PowerFlow will not start. For example, you might have 0/3 contentapi 0/3 dexserver, or you cannot log in to the user interface. The following steps cover how to reset the database nodes and and reliably re-create indexes to resolve this issue.

These steps are for PowerFlow 2.3.0 and later.

  1. Remove the stack by running the following command:

    docker stack rm iservices

  2. In the docker-compose.yml file, add AUTO_REBALANCE: "true" to the couchbase-worker and couchbase-worker2 services.

  3. Clear the indexes:

    rm -rf /var/data/couchbase/lib/couchbase/data/@2i

  4. On each node, run the following command to re-deploy the stack:

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices (re-deploy)

  5. Wait a few minutes, and then run pfctl --host hostname user:password --host hostname2 user:password --host hostname3 user:password autoheal.

  6. If the autoheal action fails before it can validate and re-create indexes, create them on each node manually pfctl --host <good-node> user:pass --host <good-node2> user:pass node-action --action create_node_indexes.

  7. Log in to the Couchbase user interface and validate the cluster health.

  8. If one of the nodes is not shown in the Couchbase user interface:

    • If the rebalancing process with only 2 nodes is still in progress, wait for that process to finish.

    • If one of the following error messages is displayed on the missing node Docker logs, and the rebalancing process is not in progress, try forcing that service to restart so that it can try joining the cluster again using the following command:

      docker service update --force couchbase_worker_node_name

      Error messages:

      • "Error initializing couchbasedb: rc: 1 error:b'ERROR: Node addition is disallowed while rebalance is in progress"
      • "cluster_check='ERROR: Unable to connect to host at <http://localhost:8091>: HTTPConnectionPool(host='\\''localhost'\\'', port=8091)"

I get a 502 error when I try to log in using the load balancer IP address

Make sure that the HOST_ADDRESS in the /etc/iservices/isconfig.yml file is pointing to that load balancer. If needed, update that value and re-deploy the PowerFlow system.

You should only use the load balancer IP when you want to log into or otherwise access the system.

Also, you should make sure that your contentapi service is properly configured in the docker-compose.yml file. If this service only specifies one of the Couchbase nodes, such as db_host: couchbase.isnet,localhost, you will not get API responses when that Couchbase node is down.

To properly configure the contentapi service in the docker-compose.yml file, use the following example as a template:

db_host:couchbase.isnet,couchbase-worker.isnet,couchbase-worker2.isnet

The healthcheck and autoheal actions in the powerflowcontrol(pfctl) command-line utility will report problems like this to you, and it will also creates relevant templates. The healthcheck and autoheal ations are supported in PowerFlow 2.0.0 and later. For more information, see Using the powerflowcontrol (pfctl) Command-line Utility.

After a node goes down, the SyncPacks page does not display the expected content

In this situation, the node that went down was the node that was hosting the PyPi server, which is not an operationally critical service. The PyPi server is only used to install new SyncPacks onto the system.

After a node goes down, I cannot access the db port for that instance of Couchbase :8091 directly

You will not be able to access the db port for that instance of couchbase :8091 directly. Because that node is down, the Couchbase server is not available to provide the Administrator IP. By default, only the user interface for that node is exposed, but you can expose the admin user interfaces on the other systems as well, and those admin user interfaces will be available even when the primary node goes down.

If you fail over one of the other nodes instead of that master, you will see that the Couchbase user interface stays up, because the node hosting that Couchbase instance is still up).

Couchbase fails to properly initialize or keeps trying to initialize

Couchbase might fail to properly initialize or keep trying to initialize if the Couchbase database is in a highly virtualized environment, where the CPUs are over-provisioned, and high CPU ready wait and costop times are occurring. In this situation, you will not be able to access the Couchbase user interface at https://<IP of PowerFlow>:8091.

To address this issue, make sure that the VMware environment is optimized and ready waits are minimized. For more information about CPU over-allocation in VMware, see Maximizing VMware Performance and CPU Utilization.

The system will not be as optimal as possible due to high ready wait times at the vCPU level. As a result, you will need to retry to initialize Couchbase on the first startup. Also, rebalances might need to be done multiple times to eventually succeed in over-provisioned environments.

Frequently Asked Questions

This section contains a set of frequently asked questions (FAQs) and the answers to address those situations.

For additional troubleshooting information specific to multi-tenant environments, see Common Problems, Symptoms, and Solutions in the PowerFlow for Multi-tenant Environments appendix.

What is the first thing I should do when I have an issue with PowerFlow?

Ensure that all the services are up and running by running the following command:

docker service ls

If any service is not running, such as display a 0/1, run the following commands:

docker service ps <service not running>

docker service logs <service not running>

Can the steprunners_syncpack service can be limited to just workers?

The syncpacks_steprunner service is responsible for keeping all SyncPack virtual environments in sync on all nodes. As you install or upgrade SyncPacks, these steprunners consistently maintain the virtual environment changes throughout the cluster without network storage. Disabling these steprunners will not affect the system operationally unless you need to update, modify, or install new SyncPacks onto the nodes of a system.

What is the difference between the steprunner_syncpacks and the steprunner services?

The syncpacks_steprunners accept no other tasks, beside create and update tasks for the virtual environment.

What is the minimal image required for workers?

At a minimum, worker nodes that will only be used to process tasks will need just the worker image.

If the GUI server is constrained to use only the manager nodes, do the worker nodes need to have their isconfig.yml file updated with the correct HOST value?    

The isconfig.yml file is applied to all nodes in the cluster when the stack is deployed. The config used is created from the config on the system where you actually run Docker stack deploy and applied to all other nodes. In other words, only the managers from which you are deploying the stack need to have the isconfig.yml value correct.

Can I unload unwanted images from a worker node?

Removing unwanted images is part of typical Docker operations, and it uses the following command: docker image rm <image name>

For more information, see https://docs.docker.com/engine/reference/commandline/image_rm/.

If I dedicated workers to one SL1 stack, how are jobs configured to run only on those workers?

You create a worker service with the user_queues variable set, with a list of queues that you want that worker to process only. The queues are auto-created. That worker service can also be pinned to run on specific nodes. You can tell an integration to run on a specific queue at runtime, on a schedule, or in the application configuration. The queue will be processed by the workers that you previously defined. If no workers are listening to the queue, the task will not process.

Approximately how much data is sent between distributed PowerFlow nodes?

The amount of data sent between distributed PowerFlow nodes largely depends on the applications that are currently running, the size of the syncs for those applications, and how much of the cache those syncs use. In general, only configuration files, cache, and some logs are stored or replicated between database nodes, while the queue service mirrors messages between its nodes as well. As a result, not a lot of information is being replicating at any given time, as only necessary cluster and vital data are replicated.

Why can't I find a SyncPack on the SyncPacks page?

If you have uploaded a SyncPack, but you cannot see it on the SyncPack page, make sure that the SyncPack filter is showing all SyncPacks. By default, the SyncPacks page will only show SyncPacks that are activated. An activated SyncPack has been installed and is ready to be used.

Click the Filter icon () at the top right of the SyncPacks page and enable the Toggle Inactive SyncPack toggle to show all of theSyncPacks.

Why can't I see or upload a SyncPack?

If your PowerFlow system uses self-signed certificates, you will need to manually accept the certificate before you can upload SyncPacks.

Go to https://<IP address of PowerFlow>:3141/isadmin, accept the certificate, and then exit out of the tab. When you log into PowerFlow again, you will be able to upload SyncPacks.

Also, if a SyncPack failed to load or activate, a red exclamation mark icon () appears next to that SyncPack on the SyncPacks page. Click the red icon () to view errors logs related to the loading or activation process.

The most common error that occurs when installing a SyncPack is that the SyncPack dependencies are not installed. Review the "System Requirements" section of the release notes for that SyncPack to ensure that you have installed all of the required applications for that SyncPack. To view a list of additional PowerFlow and SL1 products that are required by the various SyncPacks, see Dependencies for SL1 PowerFlow SyncPacks.

Why do I get a "Connection error" message when I try to install the System Utils SyncPack?

In this situation, you get the following error message:

Connection error, please verify that the steprunners can request data from pypiserver and that the syncpack system_utils_syncpack==1.0.1 uploaded to the local pypiserver is a valid syncpack.

To address this issue, review the following checklist

  • Review the logs for the "Activate & Install SyncPacks" PowerFlow application for relevant error details.
  • Make sure that the cluster nodes are not offline.
  • If the error is related to the "Activate & Install SyncPacks" application, make sure that the SyncPack that you uploaded to pypi does not have extra characters at the end, such as a "(1)" or anything else that does not follow the correct name syntax.
  • If the failure is related to the "Activate & Install SyncPacks" application, then the environment might be preventing the connection, such as the no_proxy setting in the docker-compose.yml file.
  • Make sure that the pypiserver has a pypiserver.isnet alias in the docker-compose.yml file.
  • Validate all of the required settings in the docker-compose.yml file, including no_proxy and Couchbase database settings.

The following is an example of a portion of a docker-compose.yml file for this issue:

syncpacks_steprunner:
			.... ...
			environment:
			... ....
			db_host: couchbase.isnet,couchbase-worker.isnet,couchbase-worker2.isnet
			http_proxy: http://is.cms.com:8080/
			https_proxy: https://is.cms.com:8080/
		no_proxy: .isnet, 172.1.1.1/16

How can I optimize workers, queues, and tasks?

The PowerFlow system uses Celery to spawn and manage worker processes and queues. You can define environment variables to optimize these worker processes.

You can use the following environment variables to optimize worker processes:

  • broker_load_from_backend. Lets you send a task ID to the broker and load the task data from redis. If you set this variable to True, Celery will save the payload it would have sent through the broker (RabbitMQ, SQS) and sends it to the backend (Redis) instead, reducing the size of the message. When set to False, Celery will work as it normally does. The default value for this setting is False.

    If the broker_load_from_backend variable is set in the docker-compose.yml file, it must be set in contentapi and all steprunner services (including custom steprunners, and syncpacks_steprunner), as well as the scheduler service where relevant.

  • task_soft_time_limit. Enforces global timeout for all tasks in the PowerFlow system. If a task exceeds the specified timeout limit, the PowerFlow system terminates the task so that the next task in the queue can be processed. Possible values are:
  • Integer that specifies the time in seconds.
  • Default value is "3600" (1 hour).
  • optimization. Determines how tasks are distributed to worker processes. Possible values are:
  • -Ofair. Celery distributes a task only to the worker process that is available for work.
  • " " (double-quotation mark, space, double-quotation mark). Distributes and queues all tasks to all available workers. Although this increases performance, tasks in queues might be delayed waiting for long-running tasks to complete.
  • Default value is -Ofair.
  • task_acks_late. Specifies that if a worker process crashes while executing a task, Celery will redistribute the task to another worker process. Possible values are:
  • True. Enables the environment variable.
  • False. disabled the environment variable.
  • Default value is "False".

NOTE: Because many applications run at regular intervals or are scheduled, the PowerFlow system re-executes tasks even if the task_acks_late environment variable is disabled. in the event of a worker crash, if you want to ensure that tasks are completed, you can enable the task_acks_late variable. However, be aware that if tasks are not idempotent, the task_acks_late variable can cause unpredictable results.

  • max_result_size. Organizes the step result data into chunks of the specified size, in MB. Enabled by default. Possible values are:

    • Any number greater than 0, but less than 512. The recommended value is 250 MB. The value should by specified in bytes, such as "250000000" for 250 MB.

    • 0. Turns off this option.

    • Default value is "250000000" (for 250 MB).

To define these environment variables:

  1. Either go to the console of the PowerFlow system or use SSH to access the server.
  2. Log in as isadmin with the appropriate password.
  3. Use a text editor like vi to edit the file /opt/iservices/scripts/docker-compose.yml.
  4. You can define the environment variables for one or more worker processes. The docker-compose.yml file contains definitions for worker processes. For example, you might see something like this:

services:

steprunner:

image: sciencelogic/is-worker:latest

environment:

LOGLEVEL: 10

celery_log_level: 10

logdir: /var/log/iservices

broker_url: 'pyamqp://guest@rabbit//'

result_backend: 'redis://redis:6379/0'

db_host: 'couchbase,localhost'

secrets:

- is_pass

- encryption_key

deploy:

replicas: 2

networks:

- isnet

volumes:

- "/var/log/iservices:/var/log/iservices"

- "statedb:/var/run/celery/states/"

 

steprunner_1:

image: : sciencelogic/is-worker:latest

environment:

LOGLEVEL: 10

celery_log_level: 10

task_soft_time_limit: 30

optimization: ""

task_acks_late: 'False'

logdir: /var/log/iservices

broker_url: 'pyamqp://guest@rabbit//'

result_backend: 'redis://redis:6379/0'

db_host: 'couchbase,localhost'

secrets:

- is_pass

- encryption_key

deploy:

replicas: 2

networks:

- isnet

volumes:

- "/var/log/iservices:/var/log/iservices"

- "statedb:/var/run/celery/states/"

 

steprunner_2:

image: : sciencelogic/is-worker:latest

environment:

LOGLEVEL: 10

celery_log_level: 10

task_soft_time_limit: 30

optimization: '-Ofair'

task_acks_late: 'False'

logdir: /var/log/iservices

broker_url: 'pyamqp://guest@rabbit//'

result_backend: 'redis://redis:6379/0'

db_host: 'couchbase,localhost'

secrets:

- is_pass

- encryption_key

deploy:

replicas: 2

networks:

- isnet

volumes:

- "/var/log/iservices:/var/log/iservices"

- "statedb:/var/run/celery/states/"

  1. The services with names that start with "steprunner" are the workers for the PowerFlow system. To define the optimization variables, enter the variables and values in the definition of the worker, under the environment section. See the example in step #3 to see the syntax for environment variables.

  2. After you have updated the docker-compose file, you can update and re-deploy the PowerFlow system to pick up your changes to the file. To do this, execute the following command:

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices

Why do I get a "Connection refused" error when trying to communicate with Couchbase?

If you get a "Connection refused to Couchbase:8091" error when you are trying to communicate with Couchbase, check the firewalld service by running the following command:

systemctl status firewalld

Firewalld is responsible for all of the internal communications between the various Docker services on the Docker Swarm. If firewalld is not active, there will be no communications between the services, and you might see an error like "Connection refused to Couchbase:8091".

To start the firewalld service, run the following command:

systemctl start firewalld

Why do I have client-side timeouts when communicating with Couchbase?

If you are running an intensive application, or if you are running in Debug Mode, you might see the following stack trace error:

(generated, catch TimeoutError): <RC=0x17[Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout], HTTP Request failed. Examine 'objextra' for full result, Results=1, C Source=(src/http.c,144), OBJ=ViewResult<rc=0x17[Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout], value=None, http_status=0, tracing_context=0, tracing_output=None>, Tracing Output={":nokey:0": null}>

This error occurs when there is too much load going into the Couchbase database. If you're running with Debug Mode, that mode creates a large number of extra log messages in the database, which can contributeto this error.

To work around this issue, increase the timeout being used by setting the db_host environment variable in the steprunner service:

db_host: 'couchbase.isnet,localhost?n1ql_timeout=100000000.00'

If you increase the timeout, the timeout errors should go away.

Increasing the timeout might not always be the correct action. If you are using an especially large system, you might want to allocate additional resources to the Couchbase services, including more memory for indexing and search. If you are encountering timeouts in a non-temporary fashion, such as only running Debug Mode for an application to determine what went wrong, you might want to add more resources instead of increasing the timeout.

What should I do if the Couchbase disk is full, the indexer is failing, and the database is unusable?

If the Couchbase database stops unexpectedly and the disk is full:

  1. Check the size of the /var/data/couchbase/lib/couchbase/data/@2i directory.
  2. If the contents of this directory are causing the disk to fill up, stop the couchbase service and remove the @2i directory.
  3. Restart Couchbase to return the PowerFlow system to normal operations.

Use the following optimizations to ensure that the indexer directory @2i does not overuse disk space:

  • When running PowerFlow in production, do not run applications in Debug Mode by default.
  • Remove any #primary index that exists in content or logs. Primary indexes are useful for troubleshooting, but they should not be used in production. In prior versions of PowerFlow, the primary index was automatically created, which caused no problems with typical workloads. However, with large workloads, the primary index adds unnecessary overhead that might impact the system. ScienceLogic recommends removing primary indexes that exist on systems in production.
  • Qualys scans running against a whole cluster at once have been known to affect the performance of the database, and these scans have caused crashes resulting in index corruption in the past. If you plan on running Qualys scans on the PowerFlow system, run them at an off-peak time, and only on one node of the cluster.
  • Under heavy workloads, the default auto-compaction settings might occur too frequently, causing issues due to heavy operations and compaction activities running at the same time. To reduce this possibility, select a specific day and time of the week for the database compaction to occur. Ideally compaction is configured to occur during a time of lower workloads, and not at peak. You can configure these settings in the Couchbase user interface (Settings > Auto-Compaction), or through the API: Configure Auto-Compaction with the CLI.

    Needing to change auto-compaction defaults might be an early indication that the PowerFlow system needs additional resources with more load. For more information, see the following list of workload considerations.

Sizing the Couchbase database for workload considerations:

  • indexer memory. Under heavy workloads, the indexer might be constrained and causing a bottleneck in the system due to its default 500 MB memory allocation. In real-time you can monitor the index memory percentage used through the Couchbase Administrator user interface (https://<IP of PowerFlow>:8091). If memory is constantly at 90% or greater, the system might benefit from an increase in memory allocation to the indexer service (Settings > Service Memory Quotas).
  • Make sure that VMotion is disabled in VMware, and make sure that PowerFlow nodes will not be VMotioned while running.
  • db disk size: Depending on the workload being performed and how much data is being saved into the system via logs, you might need to increase the database disk space. In general, you can expect the database to use at most twice the size of a bucket fully compacted. For more information, see Indexes size and compaction. The disk size available to the Couchbase system should be sufficient for whatever size workloads are expected.

What causes a Task Soft Timeout?

The following error might occur when you see a long-running task fail:

raise SoftTimeLimitExceeded() SoftTimeLimitExceeded: SoftTimeLimitExceeded()

This error message means that the default timeout for a task on your PowerFlow system is too low. By default the task timeout, which is set by the environment variable task_soft_time_limit, is set to 3600 seconds (1 hour).

If you intend to have tasks executing for longer than an hour at a time, you can increase this setting by changing the task_soft_time_limit environment variable in your steprunners. Note that the value is set in seconds.

How do I address an "Error when connecting to DB Host" message when access is denied to user "root"?

In this situation, you get an error similar to the following:

Error when connecting to DB Host... (1045, "Access denied for user 'root'@'10.86.21.224' (using password: NO)"

This issue occurs when the encryption_key file from one PowerFlow system does not match the encryption_key file on another system, such as when you take data from a production PowerFlow system to a test PowerFlow system.

The encryption key must be identical between two PowerFlow systems if you plan to migrate from one to another. The encryption key must be identical between High Availability or Disaster Recovery systems as well.

To address this issue:

  1. Copy the encryption_key file from the /etc/iservices/ folder on the production system to the /etc/iservices/ folder on the test system.

  2. Re-upload all applications and configurations from the production system to the test system.

  3. Remove and redeploy the stack on the test PowerFlow after copying the key by running the following command on the test system:

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices

How do I identify and fix a deadlocked state?

If PowerFlow appears to be running, but it is not processing any new applications or tasks, then PowerFlow could be in a deadlocked state.

A deadlocked state occurs when one or more applications include steps that are either ordered improperly or that contain syntax errors. In this situation, tasks are waiting on subsequent tasks to finish executing, but the worker pool is exhausted. As a result, PowerFlow is not able to execute those subsequent tasks.

To identify a deadlocked state with a PowerFlow system:

  1. Navigate to the Celery Flower interface for PowerFlow by typing the URL or IP address for your PowerFlow and adding /flower/dashboard at the end of the URL, such as https://192.0.2.0/flower/dashboard.
  2. Click the Dashboard tab for Flower. A list of workers appears:

  1. Review the number of active or running tasks for all workers. If all workers have the maximum number of tasks, and no new tasks are being consumed, then you might have a deadlock state.

To fix a deadlocked state in a PowerFlow system, perform one of the following steps:

  1. Go to the console of the PowerFlow system or use SSH to access the server.

  2. Log in as isadmin with the appropriate password.

  3. Increase the number of workers by either:

    A. Running the following command at the shell prompt:

    docker service scale iservices_steprunner=x

    where x is the number of workers.

    B. Using a text editor like vi to edit the file /opt/iservices/scripts/docker-compose.yml. In the environment: section at the top of the file, add the following:

    worker_threads: number_greater_than_3

    where number_greater_than_3 is an integer greater than 3.

  1. After you have updated the docker-compose file, you can update and re-deploy the PowerFlow system to pick up the changes in the docker-compose.yml file To do this, execute the following at the shell prompt:

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices

    The PowerFlow system should now include additional workers.

  1. Navigate to the Celery Flower interface for PowerFlow by typing the URL or IP address for PowerFlow and adding /flower/dashboard at the end of the URL, such as https://192.0.2.0/flower/dashboard.

  2. Click the Dashboard tab for Flower. A list of workers appears:

  3. Review the number of active or running tasks for all workers.

If PowerFlow appears to be running, but it is not processing any new applications or tasks, then PowerFlow could be in a deadlocked state.

A deadlocked state occurs when one or more applications include steps that are either ordered improperly or that contain syntax errors. In this situation, tasks are waiting on subsequent tasks to finish executing, but the worker pool is exhausted. As a result, PowerFlow is not able to execute those subsequent tasks.

To identify a deadlocked state with a PowerFlow system:

  1. Navigate to the Celery Flower interface for PowerFlow by typing the URL or IP address for your PowerFlow and adding /flower/dashboard at the end of the URL, such as https://192.0.2.0/flower/dashboard.
  2. Click the Dashboard tab for Flower. A list of workers appears:

  1. Review the number of active or running tasks for all workers. If all workers have the maximum number of tasks, and no new tasks are being consumed, then you might have a deadlock state.

To fix a deadlocked state in a PowerFlow system, perform one of the following steps:

  1. Go to the console of the PowerFlow system or use SSH to access the server.

  2. Log in as isadmin with the appropriate password.

  3. Increase the number of workers by either:

    A. Running the following command at the shell prompt:

    docker service scale iservices_steprunner=x

    where x is the number of workers.

    B. Using a text editor like vi to edit the file /opt/iservices/scripts/docker-compose.yml. In the environment: section at the top of the file, add the following:

    worker_threads: number_greater_than_3

    where number_greater_than_3 is an integer greater than 3.

  1. After you have updated the docker-compose file, you can update and re-deploy the PowerFlow system to pick up the changes in the docker-compose.yml file To do this, execute the following at the shell prompt:

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml iservices

    The PowerFlow system should now include additional workers.

  1. Navigate to the Celery Flower interface for PowerFlow by typing the URL or IP address for PowerFlow and adding /flower/dashboard at the end of the URL, such as https://192.0.2.0/flower/dashboard.

  2. Click the Dashboard tab for Flower. A list of workers appears:

  3. Review the number of active or running tasks for all workers.

How can I point the "latest" container to my latest available images for PowerFlow?

If you force-upgraded an RPM on top of an existing PowerFlow RPM of the same version, and you have custom worker types pointing to specific images, the latest tag gets created incorrectly.

To address this issue:

  1. Modify the docker-compose.yml file and update all SL1 images to point to the correct version that you expect to be latest.
  2. Change any custom worker or custom services using SL1 containers to point to: latest.
  3. Re-install the RPM of the same version via force.

Why does the "latest" tag not exist after the initial ISO installation?

This situation only affects users with custom services that point to the latest tag. To work around this issue, run the tag latest script manually after running the ./pull_start_iservices.sh command:

python /opt/iservices/scripts/system_updates/tag_latest.py /opt/iservices/scripts/docker-compose.yml

How do I address permissions errors with SyncPack virtual environments?

This situation is relevant only for upgrades to PowerFlow version 2.3.0. Permission errors might occur if you restarted Docker before the stack was redeployed, using the old stack services definition. In this situation:

  • The permission that was properly set during the RPM installation for PowerFlow version 2.3.0 was reset to root. This change was not expected for PowerFlow version 2.3.0, because containers were rootless starting with version 2.3.0 for more security.

  • The PowerFlow version 2.3.0 Docker containers (rootless users) cannot recreate the syncpacks virtual environments, because containers in versions of PowerFlow before 2.3.0 used root as user.

To address this issue, you need to force the steprunners to recreate the syncpacks virtual environments:

  1. Run the following commands:

    pfctl --host pf-node-ip '<username>:<password>' node-action --action modify_iservices_volumes_owner

    docker service update --force iservices_syncpacks_steprunner

  2. Verify that the syncpacks virtual environments owner is not root and that the environments were recently recreated:

    sudo ls -la /var/lib/docker/volumes/iservices_syncpacks_virtualenvs/_data/sudo ls -la /var/lib/docker/volumes/iservices_syncpacks_virtualenvs/_data/

  3. If the syncpacks virtual environments were not recreated, perform steps 4-5.

    Use caution while performing these next two steps.

  4. Remove the syncpacks virtualenvironments volume on every node:

    sudo rm -r /var/lib/docker/volumes/iservices_syncpacks_virtualenvs/_data/*

  5. Restart the syncpacks_steprunners service using either of the following actions:

    docker service update --force iservices_syncpacks_steprunner

    or

    docker service rm iservices_syncpacks_steprunner

    docker stack deploy -c /opt/iservices/scripts/docker-compose.yml --resolve-image never iservices

How do I keep from losing incidents or events if my PowerFlow system is down?

To keep from losing incidents or events, make sure that the Run Book Automation Retry Queue is enabled.  This feature is available in PowerFlow version 2.3.0 or later and SL1 version 11.1.0 or later. For more information, see Enabling Run Book Automation Queue Retries.

For ServiceNow Events, Incidents, and Cases, you can also manually request that tickets be created for any events that did not get picked up. For more information, see Configuring the "ServiceNow: Click to Create" Automation Policy.

How do I restore an offline backup of my PowerFlow system?

To completely restore a PowerFlow system from an existing backup using a fresh PowerFlow installation, copy the following files and make sure that they match what existed on the previous PowerFlow system:

  • /etc/iservices/encryption_key. This file ensures that the restored data can be decrypted as it was on the previous system.
  • /etc/iservices/is_pass. Use this file if you wish to re-use the same password from the old system.
  • /etc/iservices/isconfig.yml. This file contains authentication related settings, if used, and it sets the hostname to the load balancer, if it's a clustered environment.
  • /opt/iservices/scripts/docker-compose-override.yml. This file contains your PowerFlow container versions and environment settings.
  • /opt/iservices/scripts/docker-compose.yml. This file contains your PowerFlow container versions and environment settings.

The swarm cluster should have the same amount of nodes and node labels applied as in the previous system to ensure identically matching Docker environments. All nodes in the cluster must have the PowerFlow images in your docker-compose.yml file loaded and available (use docker image ls).

After the PowerFlow system is started, you can run the "PowerFlow Restore" application to restore all previously used applications, cache, and config data. For more information, see Backing up Data.

If the workers/api are now running on completely new nodes after the restore, you will have to re-install the SyncPacks from the PowerFlow user interface to install the environments on the new nodes. Applications from those SyncPacks will already be configured with the restored settings.

This process is not considered a "Passive Disaster Recovery failover scenario. Active/Passive relationships between PowerFlow clusters is not supported.

What do I do if I get a Code 500 Error when I try to access the PowerFlow user interface?

To address this issue:

  1. SSH to your PowerFlow instance.

  2. Check your Docker services with the following command:

    docker service ls

  1. Ensure that all of your services are up and running:

  1. If all of your services are up and running, but you are still getting Code 500 Errors, navigate to the Couchbase management portal of your PowerFlow server at port 8091 over HTTPS.
  2. In the Couchbase portal, navigate to the Indexes tab and verify that all of your indexes are in a ready state:

  1. Wait until all status entries are ready and all build progress entries are 100%, and then navigate back to your PowerFlow user interface.

  2. Verify that the Code 500 Error no longer exists.

  3. If an index is stuck in a non-ready state, find the index name, copy that value and execute the following command in the Couchbase Query Editor:

    BUILD INDEX ON content(INDEX_NAME_HERE)

What should I do if I get a 500 Error?

A 500 Internal Server Error will always have some kind of stack trace in the contentapi logs. Run the following command to find the cause of the 500 Error:

docker service logs -t iservices_contentapi

A 502 or 504 Error might mean that the user interface cannot reach an API container on a node, or the API container cannot reach a database node in the cluster. To address this issue:

  • For a cluster, make sure the cluster is healthy, and all database nodes are balanced.

  • For a cluster, make sure that the firewall ports are open on all nodes.

  • Run the following commands to check the logs for 502 or 504 Errors:

    docker service logs -t iservices_gui

    docker service logs -t iservices_contentapi

    The logs will specify which container caused a timeout when trying to reach that container.

What are some common examples of using the iscli tool?

The PowerFlow system includes a command line utility called the iscli tool. You can use the iscli tool to upload components such as steps, configurations, and applications from the local file system onto PowerFlow.

For more information on how to use this tool, SSH to your PowerFlow instance and type the following command:

iscli --help

You can use the iscli tool to add drop files or additional content onto the PowerFlow. You can also use the utility to upload content to a remote host. Examples of common syntax include the following:

iscli -usf <STEP_FILE.PY> -U isadmin -p <password>

iscli -uaf <APPLICATION_FILE.JSON> -U isadmin -p <password>

iscli -ucf <CONFIG_FILE.JSON> -U isadmin -p <password>

iscli -usf <STEP_FILE.PY> -U isadmin -p <password> -H <PF_HOST>

The password for the iscli tool should be the same password as the PowerFlow Administrator (isadmin) user password. For more information, see Changing the PowerFlow Password.

How do I view a specific run of an application in PowerFlow?

To view the log results of a previous execution or run of an application in the PowerFlow:

  1. Use Postman or another API tool to locate the appID and the name of the application.
  2. In the PowerFlow, update the PowerFlow URL with the appID, in the following format:

    https://<PowerFlow>/integrations/<application_name>?runid=<App_ID>

For example:

https://<PowerFlow>/integrations/CreateServiceNowCI?runid=isapp-d8d1afad-74f8-42d4-b3ed-4a2ebcaef751

Why am I getting an "ordinal not in range" step error?

If you get an "ordinal not in range" error, check your CI Class Mappings to make sure the mappings do not contain international or "special" characters.

For example:

AWS | Availability Zone - São Paulo

If you find a class mapping with a special character like the above example, remove the class mapping, or rename the device class in SL1 to not include the special characters. Then you can sync the CI classes again.

How do I clear a backlog of Celery tasks in Flower?

To clear a backlog of Celery tasks:

  1. docker exec into a bash shell in a worker process. For example:

    docker exec -it e448db31aaec /bin/bash

    where e448db31aaec is the container ID of the is-worker process on your system

  1. Run the Python interpreter.

  2. Run the following commands:

    from ipaascommon.celeryapp import app

    app.control.purge()

Why does traffic from specific subnets not get a response from PowerFlow?

In this situation, you can see traffic going into the host and into the Docker network, the traffic is not being routed back out. Responses were lost in the Docker ingress network, and the client timed out.

To address this issue:

  1. Remove the Docker service by running the following command:

    docker stack rm iservices

  2. Remove the default ingress network:

    docker network rm ingress

  3. Add a newly addressed ingress network:

    docker network create --driver overlay --ingress --subnet=172.16.0.0/16 --gateway=172.16.0.1 ingress

  4. Redeploy PowerFlow:

    docker stack deploy -c docker-compose.yml iservices

If the containers have an exposed port and you find the following error in the logs, you might need to remove /var/lib/docker/network/files/local-kv.db:

error="failed to detect service binding for container iservices_gui…"

To address this issue:

  1. Remove the Docker service:

    docker stack rm iservices

  2. Remove the .db file:

    rm /var/lib/docker/network/files/local-kv.db

  3. Restart the docker daemon:

    systemctl restart docker

  4. Redeploy PowerFlow:

    docker stack deploy -c docker-compose.yml iservices

What should I do if the number of tasks listed in the dashboards is not accurate?

To address an issue where the number of tasks listed in the PowerFlow and Flower dashboards do not match the Task List, you can set the FLOWER_MAX_TASKS environment variable in the PowerFlow docker-compose file to 20,000 tasks or higher.

For example:

flower:

environment:

... ....

worker_type: flower

FLOWER_MAX_TASKS: 20000