Services

Services in Virtana Service Observability brings service impact monitoring to your cloud environment. It lets you group related infrastructure entities into services, monitor their health, and identify the root cause of issues when problems occur.

Services has three main areas:

Dynamic Services: Groups of related entities that you define for monitoring. For each dynamic service, you can review availability and performance health, examine impact events, and view the dependency graph to perform root cause analysis.
Logical Nodes: Custom nodes that you create and add to dynamic services. Use logical nodes to represent infrastructure components that aren't modeled as standard CZ entities, or to capture events from third-party monitoring systems that forward events to CZ.
Metatype Configuration: Settings that control which components are excluded from service impact monitoring. Reducing the number of monitored components improves model performance and makes root cause analysis more efficient.

Services data is sourced from your Collection Zone (CZ) deployment. Dynamic services can include three types of members:

CZ entities: Devices or components monitored in CZ. Events generated in CZ drive the availability and performance states that Services displays.
Logical nodes: Custom nodes you create in Services to represent infrastructure components that aren't modeled as standard CZ entities.
Other dynamic services: Dynamic services can be nested as members of other dynamic services, allowing you to model complex service hierarchies.

For information about how service states are determined and propagated, see Service states.

About Services

This section provides background information on how Services works, including how root cause analysis and impact analysis are performed, and how service states are determined and propagated.

Root cause analysis and impact analysis

Services provides two closely related capabilities that help you reduce the time it takes to identify and resolve issues in your environment.

Root cause analysis (RCA): Identifies which infrastructure dependencies are affecting a service and traces the origin of a problem through the dependency graph to its source.
Impact analysis: Identifies which services are affected by a specific piece of infrastructure. This helps you understand the blast radius of an issue before taking action, and prevents unplanned outages caused by operational changes such as patching or migrating infrastructure.

How Services builds the dependency model

Services automatically builds and maintains a model of the relationships between your service members and their infrastructure dependencies. When infrastructure changes -- for example, when a virtual machine moves to a new host -- Services updates the model automatically. You don't need to make manual configuration changes as long as the infrastructure is monitored in Collection Zone (CZ).

When an event occurs on a service member, Services traces the path through the dependency graph to determine which services are affected and generates impact events. The Impact events view shows you the events that have changed the state of a service, along with related events ranked by confidence so you can identify the most likely root cause. The Impact view displays the dependency graph visually, letting you follow the chain of state changes from the origin of a problem to the affected service.

Example: Web service degradation

Consider a web service that runs across a fleet of application servers. The application servers run on virtual machines, which run on a cluster of hypervisors that use network storage. The dynamic service is configured to include a synthetic check of the web service and the fleet of application servers. By extension, the appropriate virtual machines, hypervisors, attached storage, and networking are automatically included as members of the service.

When an event occurs anywhere in this stack -- whether on the synthetic check or in any of the underlying infrastructure -- Services generates an impact event and provides a confidence-ranked list of likely root causes in the Related events table. Switching to the Impact view provides further context about the service's deployment architecture and shows how the root cause was identified through the dependency graph.

Resolving the identified root cause, whether it is in the networking, storage, virtualization, or operating system layer, provides the fastest path to restoring the service to a healthy state.

Using impact analysis to prevent outages

Once you have defined dynamic services for RCA, impact analysis is available automatically. Before making changes to a piece of infrastructure -- such as taking a host offline for maintenance -- you can check which services depend on it by reviewing the Members view of any service that includes that infrastructure as a member.

This gives your operations team the information they need to plan changes safely, migrate workloads proactively, and avoid unplanned service disruptions.

Opening Services

You can access Services from the following location:

The Virtana Service Observability main navigation is visible on all pages. In the main navigation (header or left menu), click SERVICES.

Note

Your account must be assigned the Read Only User role or a more privileged role to access Services. For more information, see User roles.