For years various types of systems have produced log files to help with monitoring, debugging and performance management. Often, this information was used in forensic analyses of why interruptions in service or other problems occurred. In many cases, log files are still used this way. But systems have grown more complicated, and many more devices are instrumented. Systems have been decomposed into much finer-grained, interdependent services. Infrastructure is now distributed between on-premises and multiple cloud providers. In addition, expectations now include 24x7 operation and real-time responsiveness. All of these factors combine to create challenges with volume and velocity of data that is collected and analyzed.
Much of this information can be referred to as “machine data.” It has also been called “data exhaust,” and was previously not considered to be very important. However, as systems have become increasingly sophisticated and generate more of this information, organizations need this data for several reasons. First, to observe systems and keep them operating properly and resolve issues when they arise. Second, the information these systems throw off provides a detailed view into many of the operations of an organization, including the way the organization interacts with its customers. In fact, my colleague Matt Aslett asserts that through 2026, more than one-quarter of organizations will combine business event data with machine-generated telemetry data to provide context and generate additional business value from observability.
There are many use cases for observability, with the most common being ensuring uptime and meeting service level agreements for various systems. All the information collected – including network traffic, users, systems accessed and behavior of those systems – can provide the foundation for security information and event management. Given the increasingly digital nature of customer interactions and with increasingly instrumented physical processes such as the movement of checked baggage through the baggage handling system, many elements of customer experiences can be tracked and managed. My colleague Keith Dawson has written about how IoT data is used to enhance customer experiences with field service.
Observability can also help improve resilience in organizations. Ventana Research’s CEO Mark Smith authored a series of perspectives on considerations for business continuity in general, beginning with this look at some of the investments organizations must make to mitigate the risk of business disruptions. Other use cases include the optimization of IT operations using artificial intelligence (referred to as AIOps) and operationalizing compliance processes and associated auditing.
To deliver these use cases, observability platforms include capabilities to collect and process all the various sources of information that provide insight into the operation of systems and processes in an organization. Critical system capabilities include:
- Collectors and connectors to gather information from various logs, event streams, traces and metrics.
- A scalable data store, such as a data lake, to manage and retain the information collected.
- Search capabilities to find the information related to specific incidents and use cases.
- Interactive visualization tools such as reporting and dashboarding to investigate sets of information, trends over time and comparisons.
- Alerting to raise awareness of issues requiring attention.
- Artificial intelligence and machine learning-based analytics and modeling to predict system and user behavior.
- Schedulers for data collection, analysis, alerting and sharing of data.
One of the challenges with observability is that much of the data and many of the capabilities overlap with other information systems. For instance, machine data is often a significant portion of the data stored in a data lake. Similarly, business intelligence systems provide interactive visualization, reporting, dashboarding and alerting. Organizations should avoid duplication in observability and other data and analytics efforts. Ideally, observability data can be used for other purposes as well and is available in a common data lake format.
Given the digital footprint of most businesses today, observability has become a necessity. Organizations may be tempted to build proprietary observability software, since all of the capabilities exist in available data platforms and analytical tools. However, buying an application from one of the observability vendors will quickly provide broad sets of capabilities and eliminate the ongoing maintenance required for homegrown systems.