Configuring vLLM Monitoring

Download this manual as a PDF file

The following sections describe how to collect and analyze metrics from vLLM (vector Language Learning Model) deployments in SL1 using the "vLLM Monitoring" PowerPack:

Prerequisites for Monitoring vLLM Deployments

To configure the SL1 system to monitor vLLM deployments using the vLLM Monitoring PowerPack, you must meet the following requirements and have the following information:

  • You must install version 102 of the "Low-code Tools" PowerPack. This will allow you to create a universal credential to use when aligning the Dynamic Applications in this PowerPack to the vLLM device.

  • The vLLM metrics endpoint must be enabled and verified to be working. For information about enabling and verifying the vLLM metrics endpoint, see the vLLM documentation on production metrics.

  • The exposed /metrics endpoints must be reachable by SL1 through one of the normal methods of authentication supported by Low-Code Tools. See https://docs.sciencelogic.com/dev-docs for more information about the supported methods of authentication.

  • A device in SL1 with the IP address where the /metrics endpoint is exposed. For example, you can have a physical device or virtual device that represents the vLLM model server, as long as it has an IP address and that IP works for http://<ip>:<port>/metrics calls.

A virtual device can only be used if an IP address is configured on it.

Creating a Credential for vLLM

To configure SL1 to monitor vector Language Learning Model  (vLLM) deployments, you must first create a universal type credential. This credential allows the Dynamic Applications in the vLLM Monitoring PowerPack to communicate with vLLM deployments.

You must install the "Low Code Tools" PowerPack, version 102 or greater to create a universal credential for aligning the Dynamic Applications in this PowerPack to your virtual or physical device.

To configure a universal credential to access a vLLM deployment:

  • Go to the Credentials page (Manage > Credentials).

To configure the universal credential, you must use the default SL1 user interface, not the classic user interface.

  1. Click Create New and select Create Low-code tools: rest v102 Credential. The Create Credential modal page appears.
  1. Supply values in the following fields:
  • Name. Type a name for your credential.
  • All Organizations. Toggle on (blue) to align the credential to all organizations, or toggle off (gray) and then select one or more specific organizations from the from the What organization manages this service? drop-down field to align the credential with those specific organizations.
  • Authentication Type. Select the appropriate authentication type. Depending on the authentication type selected, you may need to provide additional information. For more information, see https://docs.sciencelogic.com/dev-docs.
  • URL. Type the URL of your vLLM deployment.
  1. Click Save & Close.

Aligning Dynamic Applications to vLLM Deployments

If you have already discovered the vLLM instance as a physical device, you can align the vLLM Dynamic Applications to that device. If you do not have a physical device for the vLLM instance, you must create a virtual device and then manually align Dynamic Applications to the virtual device.

Manually Aligning vLLM Dynamic Applications to the Physical Device

To manually align the  "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications to the physical device:

  1. Go to the Devices page (Devices > Classic Devices, or Registry > Devices > Device Manager in the classic SL1 user interface).
  1. Locate your vLLM physical device and click its wrench icon ().
  2. In the Device Investigator, click the Collections tab. 
  3. Click the Actions button at the top of the page, then click the Add Dynamic Application button.
  1. In the Align Dynamic Application modal, locate and select the "vLLM Metrics Config" Dynamic Application.
  1. Under Credentials, select the vLLM credential you created and click Save.
  1. Repeat steps 4-6 for the "vLLM Metrics Performance" Dynamic Application.
  2. Click Save.

Creating a vLLM Virtual Device

If you do not have a physical device to align the "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications to, you must create a virtual device that represents the vLLM deployment. A virtual device is a user-defined container that represents a device or service that cannot be discovered by SL1. You can use the virtual device to store information gathered by policies or Dynamic Applications.

If you want to discover more than one vLLM account, you must create a virtual device for each account that you want to use.

To create a virtual device that represents your vLLM deployment:

  1. Go to the Device Manager page (Devices > Classic Devices, or Registry > Devices > Device Manager in the classic SL1 user interface).

  1. Click Actions and select Create Virtual Device from the menu. The Virtual Device modal page appears.
  1. Enter values in the following fields:
  • Device Name. Enter a name for the device.
  • Organization. Select the organization for this device. The organization you associate with the device limits the users that will be able to view and edit the device. Typically, only members of the organization will be able to view and edit the device.
  • Device Class. Select Virtual Device | Content Verification.
  • Collector. Select the collector group that will monitor the device.
  1. Click Add to create the virtual device.
  2. Repeat these steps for each vLLM deployment that you want to use.

Manually Aligning the vLMM Dynamic Applications to the Virtual Device

After creating the vLLM virtual device, you must manually align the "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications to the virtual device.

To manually align the  "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications:

  1. Go to the Devices page (Devices > Classic Devices, or Registry > Devices > Device Manager in the classic SL1 user interface).

  1. Locate your vLLM virtual device and click its wrench icon ().
  2. In the Device Investigator, click the Collections tab. 
  3. Click the Actions button at the top of the page, then click the Add Dynamic Application button.
  1. In the Align Dynamic Application modal, locate and select the "vLLM Metrics Config" Dynamic Application.
  1. Under Credentials, select the vLLM credential you created and click Save.
  1. Repeat steps 4-6 for the "vLLM Metrics Performance" Dynamic Application.
  2. Click Save.

vLLM Dashboard

The "vLLM Monitoring" PowerPack includes the "vLLM Dashboard" that you can use to view various metrics for devices aligned to the "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications. The "vLLM Dashboard" contains the following widgets:

  • Devices with vLLM DAs aligned. Lists the devices that are aligned to the "vLLM Metrics Config" and "vLLM Metrics Performance" Dynamic Applications

  • Time Per Output Token (avg) Line Chart. Displays the average time (in seconds) it takes to generate each token of output, providing insight into device processing speed and efficiency

  • Time to First Token (avg) Line Chart. Displays the average time in seconds it takes from receiving a request to generating the first token of output, providing an indication of the initial response latency

  • Current GPU Requests Line Chart. Displays the number of in-progress requests currently being processed.

  • GPU KV-cache Usage Line Chart. Displays the percentage of the GPU KV cache (key-value memory) that is currently in use.

  • GPU Requests Waiting Gauge. Displays the number of requests currently waiting in the queue to be processed, indicating the level of demand on the system and potential delays in processing.