Configuring NVIDIA GPU Monitoring

Download this manual as a PDF file

This section describes how to configure NVIDIA GPU devices for monitoring by Skylar One using the "NVIDIA GPU" PowerPack.

Prerequisites for Monitoring NVIDIA GPU Devices

To configure the Skylar One system to monitor NVIDIA GPU devices using the "NVIDIA GPU" PowerPack, you must first have the following information about NVIDIA GPU:

  • SSH (Secure Shell) credentials with permissions to run the nvidia-smi command.

  • Physical NVIDIA GPU devices that you can align with the Dynamic Applications included in this PowerPack.

Creating an SSH/Key Credential for NVIDIA GPU

To configure Skylar One to monitor NVIDIA GPU devices, you must first create an SSH/Key credential. This credential allows the Dynamic Applications in the "NVIDIA GPU" PowerPack to communicate with NVIDIA GPU devices.

The PowerPack includes an example SSH/Key credential that you can edit for your own use.

To configure an SSH/Key credential to access an NVIDIA GPU:

  1. Go to the Credentials page (Manage > Credentials).
  2. Locate the Nvidia GPU Monitoring- Example sample credential, click its Actions icon () and select Duplicate. A copy of the credential, called Nvidia GPU Monitoring - Example copy appears.
  3. Click the Actions icon () for the Nvidia GPU Monitoring - Example copy credential and select Edit. The Edit Credential modal page appears.

  1. Supply values in the following fields:
  • Name. Type a new name for the credential.
  • Hostname/IP. Type "%D”. Skylar One will replace it with the device's IP.
  • Timeout (ms). The time in milliseconds, after which Skylar One will stop trying to communicate with the monitored host.
  • Username. Type the SSH account username. This will be used to connect to the monitored host.
  • Password. Type the password for the SSH account.
  • Private Key (PEM Format). Type the SSH private key.

The private key can have a maximum of 64 characters per line. Therefore, you cannot use keys in the OpenSSH format, because that format uses 70 characters per line. When you attempt to save the credential, Skylar One will validate that the private key entered is in the correct format. You will be able to save the credential only if the private key is correctly formatted.

  1. Click Save & Close.

Creating an SSH/Key Credential for NVIDIA GPU in the Skylar One Classic User Interface

To configure Skylar One to monitor NVIDIA GPU devices, you must first create an SSH/Key credential. This credential allows the Dynamic Applications in the "NVIDIA GPU" PowerPack to communicate with NVIDIA GPU devices.

The PowerPack includes an example SSH/Key credential that you can copy and edit for your own use.

To configure an SSH/Key credential to access an NVIDIA GPU:

  1. Go to the Credential Management page (System > Manage > Credentials).
  2. Locate the Nvidia GPU Monitoring- Example credential, then click its wrench icon (). The Edit SSH/Key Credential modal page appears:

  1. Complete the following fields:
  • Name. Type a new name for the credential.
  • Hostname/IP. Type "%D”. Skylar One will replace it with the device's IP.
  • Timeout (ms). The time in milliseconds, after which Skylar One will stop trying to communicate with the monitored host.
  • Username. Type the SSH account username. This will be used to connect to the monitored host.
  • Password. Type the password for the SSH account.
  • Private Key (PEM Format). Type the SSH private key.

The private key can have a maximum of 64 characters per line. Therefore, you cannot use keys in the OpenSSH format, because that format uses 70 characters per line. When you attempt to save the credential, Skylar One will validate that the private key entered is in the correct format. You will be able to save the credential only if the private key is correctly formatted.

  1. Click the Save As button.

Discovering an NVIDIA GPU Device

To create and run a discovery session that will discover an NVIDIA GPU root device, perform the following steps:

  1. On the Devices page () or the Discovery Sessions page (Devices > Discovery Sessions), click the Add Devices button. The Select page appears.

  1. Click the Unguided Network Discovery button. Additional information about the requirements for discovery appears in the General Information pane to the right.
  1. Click Select. The three step discovery wizard appears, starting with the Basic Information page.
  2. Complete the following fields:
  • Name. Type a unique name for this discovery session. This name is displayed in the list of discovery sessions on the Discovery Sessions tab.
  • Description. Optional. Type a short description of the discovery session. You can use the text in this description to search for the discovery session on the Discovery Sessions tab.
  • Select the organization to add discovered devices to. Select the name of the organization to which you want to add the discovered devices.

  1. Click Next. The Credential Selection page of the wizard appears.

  1. On the Credential Selection page, locate and select the SSH/ Key credential you created.
  1. Click Next. The Discovery Session Details page of the Add Devices wizard appears:

  1. Complete the following fields:
  • List of IPs/Hostnames. Type the IP address for the NVIDIA GPU root device.

  • Which collector will discover these devices?. Select an existing collector to monitor the discovered devices. Required.
  • Run after save. Select this option to run this discovery session as soon as you click Save and Close.

  1. Click Save and Close to save the discovery session. The Discovery Sessions page (Devices > Discovery Sessions) displays the new discovery session.
  2. If you selected the Run after save option on this page, the discovery session runs, and the Discovery Logs page displays any relevant log messages. If the discovery session locates and adds any devices, the Discovery Logs page includes a link to the Device Investigator page for the discovered device.

Aligning Dynamic Applications to NVIDIA GPU Devices

A device template allows you to save a device configuration and apply it to multiple devices. The "NVIDIA GPU" PowerPack includes the "Nvidia GPU Monitor Template" which enables Skylar One to align all Dynamic Applications to the root component device.

Configuring the Device Template

Before you can use the "Nvidia GPU Monitor Template" you need to configure the template so that each dynamic application in the template aligns with the credential you created earlier.

To configure the device template:

  1. Go to the Configuration Templates page (Devices > Templates).
  2. Locate the "Nvidia GPU Monitor Template" and click its wrench icon (). The Device Template Editor modal page appears.
  3. Change the name of the template and click Save As. This will create a copy of the template.
  4. Click the Dyn Apps tab. The Editing Dynamic Application Subtemplates page appears.
  5. In the Credentials drop-down list, select the credential that you created for NVIDIA GPU.
  6. Click the next Dynamic Application listed in the Subtemplate Selection section on the left side of the page and then select the credential you created in the Credentials field.
  7. Repeat step 5 until you have selected that credential in the Credentials field for all of the Dynamic Applications listed in the Subtemplate Selection section.
  8. Click Save.

Using the Device Template to Align Dynamic Applications to NVIDIA GPU Devices

To align the NVIDIA GPU Dynamic Applications to NVIDIA GPU devices:

  1. Go to the Device Manager page (Devices > Classic Devices, or Registry > Devices > Device Manager in the classic user interface).
  2. On the Device Manager page, select the checkbox for all devices where you want to align the NVIDIA GPU Dynamic Applications.
  3. In the Select Actions field, in the lower right, select the option MODIFY by Template and click the Go button. The Device Template Editor page appears:
  1. Complete the following fields:
  • In the Template drop-down list, select the name of the device template you configured earlier.
  • In the Credentials drop-down list, select the credential you created earlier.
  1. Click the Apply button, and then click Confirm to align the Dynamic Applications to the selected devices.
  2. Confirm that the Dynamic Applications were aligned with the selected devices by clicking on a device's wrench icon () and selecting the Collections tab. Any aligned Dynamic Applications will be listed.