Overview of Concepts for Incident Management

Download this manual as a PDF file

This section introduces the concepts of Incident Management in SL1.

Use the following menu options to navigate the SL1 user interface:

  • To view a pop-out list of menu options, click the menu icon ().
  • To view a page containing all the menu options, click the Advanced menu icon ().

This section covers the following topics:

Super Service Provider

The following examples use the fictional company Super Service Provider.

The Network Operations Center (NOC) for Super Service Provider is responsible for monitoring service delivery, customer satisfaction, SLA compliance, and emergency management.

The NOC acts as the initial point of contact for all customer requests, incidents, and problems. NOC personnel either resolve the problem or escalate the ticket to the appropriate Engineering teams.

Super Service Provider is governed by Service Level Agreements (SLAs) in place between Super Service Provider and its customers.

Organizations

An organization is a group for managing elements and user accounts. All hardware, software, policies, events, tickets, and users in SL1 are associated with an organization.

The bare-bones characteristics of an organization are:

  • A unique name (required)
  • Users who are members of the organization
  • Elements, such as devices, that are associated with the organization

Organizations can be defined by geographic areas, departments, types of devices, or any structure that works best for your needs.

NOTE: For more details on organizations, see the section on Organizations and Users.

For this example, we will create two organizations:

  • NOC. Personnel who work in the NOC will be members of the NOC organization.
  • Engineering. Personnel who work in the Engineering department will be members of the Engineering organization.

Tickets

A ticket is a request for work. This request can be in response to a problem that needs to be fixed, for routine maintenance, or for any type of required work.

In this example, tickets can come from customers, an automated monitoring system, or an internal employee.

In this example, the request in each ticket can be classified into one of the following categories:

  • Request. A minor change or request for information.
  • Incident. A service interruption or degradation.
  • Event. An alert or notification created by a monitoring tool.
  • Project. A temporary endeavor undertaken to meet unique goals and objectives.
  • Emergency. An incident that impacts more than one customer.
  • Other. Any item not classified as one of the types listed above.

Ticket Source

Ticket source is a description of where the ticketed originated.

In this example, tickets originate from one of the following sources:

  • Email. Email ticket created by a customer and received by NOC staff.
  • Portal. Ticket created by a customer through the portal.
  • Phone. Telephone call from customer to NOC staff. NOC staff creates a ticket and assigns the ticket to an engineering queue for resolution.
  • Event. Event created by an automated management system. NOC staff creates a ticket and assigns the ticket to an Engineering queue for resolution.
  • Other. Any ticket opened not as a result of one of the sources listed above. An example might be a ticket opened by an internal employee of Super Service Provider suggesting an improvement.

To comply with their Service Level Agreements and to improve operational efficiency or workflow, Super Service Provider has policies in place to handle tickets based on the ticket source.

For example, NOC staff must review and triage tickets that are generated by customers through email or the online portal. The NOC personnel determines the appropriate severity and actions, and the correct Engineering team to engage.

Tickets created by employees of Super Service Provider, or tickets from an automated management system, can be placed directly into the queue that corresponds to the engineering team responsible for resolving the ticket.

Ticket Severity

Tickets are assigned a severity based on the importance of the issue that needs to be fixed or worked on. For example, a server failing might require a critical ticket, while a routine maintenance issue might require only a minor ticket. Ticket severities include healthy, notice, minor, major, and critical.

In this example, Super Service Provider uses severities to define the urgency of the problem, required response times, and SLAs. Super Service Provider classifies work by using one of the following severities:

  • Critical. A service outage that requires an immediate response.
  • Major. A service degradation or imminent service outage that requires a response within one hour.
  • Minor. A minor change or request for information that requires a response within four hours.
  • Notice. A minor change or request for information that requires a response within one day.
  • Healthy. A minor change or request for information that requires a response within three days.

Ticket Status

Ticket status provides a quick way to identify the current state of the ticket. The possible ticket statuses include:

  • Open. The ticket has been created.
  • Pending. Someone acknowledged the ticket and is awaiting the next action.
  • Working. Someone is working on the ticket.
  • Resolved. The issue has been resolved.

In this example, Super Service Provider examines ticket status to monitor compliance with Service Level Agreements. In addition, Super Service Provider also examines the following information about each ticket to monitor compliance with Service Level Agreements:

  • Assigned To. Specifies a user, who must be a member of the current ticket queue, to whom the ticket has been assigned.

Ticket Queues

Ticket queues allow you to organize and filter tickets. Organizing tickets by ticket queues and then assigning users to ticket queues allows each user to see only the tickets that are relevant to him or her.

You can also use ticket queues to move tickets through a pre-defined workflow. In our example, Super Service Provider uses the following ticket queues:

  • Triage. All new tickets from customer emails or customer access to the portal are assigned to this queue. The NOC staff manages this queue.
  • Network Engineering. Tickets are assigned to this queue if someone from the Network Engineering team must work on and resolve the ticket. Only employees of Super Service Provider can assign tickets to this queue.
  • Windows Engineering. Tickets are assigned to this queue if someone from the Windows Engineering team must work on and resolve the ticket. Only employees of Super Service Provider can assign tickets to this queue.
  • Linux Engineering. Tickets are assigned to this queue if someone from the Linux Engineering team must work on and resolve the ticket. Only employees of Super Service Provider can assign tickets to this queue.
  • Follow-up. Tickets are assigned to this queue after the request or repair has been completed, but the NOC staff or Engineering staff are waiting for customer confirmation or approval before setting the status of the tickets to Resolved.

Service Level Agreements

A Service Level Agreement or SLA is a written contract between a service provider and its customers. An SLA describes the service that will be provided for each fee, the level of customer support, and the penalty when that service is not provided. For example, many Internet service providers use SLAs that specify percentage of uptime, guaranteed performance benchmarks, help-desk response time, and advance notification of changes that could affect the customer.

In our example, SLAs are based on ticket status (open, pending, working, resolved), the Assigned To value, and the severity (healthy, notice, minor, major, critical) of each ticket.

In our example, Super Service Provider is governed by several Service Level Agreements (SLAs) between Super Service Provider and its customers.

New Ticket SLAs

These SLAs determine the actions that Super Service Provider must perform on each new ticket and how quickly Super Service Provider must perform these actions.

  • New Acknowledgment SLA. All new tickets must be acknowledged within 15 minutes of creation. A ticket is considered acknowledged when its status changes from Open to Pending.
  • New Assignment SLA. All new tickets must be assigned to an engineer within 30 minutes of creation. A ticket is considered assigned when a user is associated with the ticket.

Existing Ticket SLAs

These SLAs determine the actions that Super Service Provider must perform on each existing ticket after it has been assigned to an engineer, and how quickly Super Service Provider must perform these actions.

  • Existing Updated SLA. After a ticket has been assigned to Engineering, the ticket severity determines the required acknowledgment as follows:
  • Critical. Tickets with a severity of Critical must be updated within 35 minutes of assignment. To update the ticket, the assigned user can assign the ticket to someone else, change the status of the ticket to Working, or add a note to the ticket.
  • Major. Tickets with a severity of Major must be updated within one hour of assignment. To update the ticket, the assigned user can assign the ticket to someone else, change the status of the ticket to Working, or add a note to the ticket.
  • Minor. Tickets with a severity of Minor must be updated within four hours of assignment. To update the ticket, the assigned user can assign the ticket to someone else, change the status of the ticket to Working, or add a note to the ticket.
  • Notice. Tickets with a severity of Notice must be updated within one day of assignment. To update the ticket, the assigned user can assign the ticket to someone else, change the status of the ticket to Working, or add a note to the ticket.
  • Healthy. Tickets with a severity of Healthy must be updated within three days of assignment. To update the ticket, the assigned user can assign the ticket to someone else, change the status of the ticket to Working, or add a note to the ticket.

Escalation

Ticket escalation policies automatically perform actions on a ticket when specified conditions have been met.

For example, the escalation conditions could be "if a ticket has a severity of 'major,' is three days old, and has not yet been assigned to a user." When a ticket meets those conditions, the escalation policy performs the actions "change ticket's severity to 'critical' and assign the ticket to the queue administrator."

The example uses the following escalation policies:

  • NEW ACKNOWLEDGMENT SLA VIOLATION IMMINENT. When the ticket age is 10 minutes AND the ticket has not been acknowledged (status changed from Open to Pending), SL1 sends a notification to all NOC queue members indicating an SLA violation is imminent.
  • NEW ACKNOWLEDGMENT SLA VIOLATION HAS OCCURRED. When the ticket age is 15 minutes AND the ticket has not been acknowledge (status changed from Open to Pending), SL1 sends a notification to the Director of Operations indicating an SLA violation has occurred.
  • NEW ASSIGNMENT SLA VIOLATION IMMINENT. When the ticket age is 25 minutes AND the ticket has not been assigned to a user, SL1 sends a notification to all NOC queue members indicating an SLA violation is imminent.
  • NEW ASSIGNMENT SLA VIOLATION HAS OCCURRED. When the ticket age is 30 minutes AND the ticket has not been assigned, SL1 sends a notification to the Director of Operations indicating an SLA violation has occurred.
  • CRITICAL UPDATE SLA VIOLATION IS IMMINENT. When the ticket has not been modified in 30 minutes AND the ticket has not been updated, SL1 sends a notification to all Engineering queue members.
  • CRITICAL UPDATE SLA VIOLATION HAS OCCURRED. When the ticket has not been modified in 35 minutes AND the ticket has not been updated, SL1 sends a notification to the Director of Engineering.
  • MAJOR UPDATE SLA VIOLATION IS IMMINENT. When the ticket has not been modified in 55 minutes AND the ticket has not been updated, SL1 sends a notification to all Engineering queue members.
  • MAJOR UPDATE SLA VIOLATION HAS OCCURRED. When the ticket has not been modified in one hour AND the ticket has not been updated, SL1 sends a notification to the Director of Engineering.
  • MINOR UPDATE SLA VIOLATION IS IMMINENT. When the ticket hasn't been modified in three hours and 55 minutes AND the ticket has not been updated, SL1 sends a notification to all Engineering queue members.
  • MINOR UPDATE SLA VIOLATION HAS OCCURRED. When the ticket hasn't been modified in four hours AND the ticket has not been updated, SL1 sends a notification to the Director of Engineering.
  • NOTICE SLA VIOLATION IS IMMINENT. When the ticket hasn't been modified in 23 hours and 55 minutes AND the ticket has not been updated, a notification is sent to all Engineering queue members.
  • NOTICE UPDATE SLA VIOLATION HAS OCCURRED. When the ticket hasn't been modified in one day AND the ticket has not been updated, SL1 sends a notification to the Director of Engineering.
  • HEALTHY UPDATE SLA VIOLATION IS IMMINENT. When the ticket hasn't been modified in two days, 23 hours, and 55 minutes AND the ticket has not been updated, SL1 sends a notification to all Engineering queue members.
  • HEALTHY UPDATE SLA VIOLATION HAS OCCURRED. When the ticket hasn't been modified in three days AND the ticket has not been updated, SL1 sends a notification to the Director of Engineering.