5 minute read 12 Oct 2023
Observability-driven development (odd)

Resilient software delivery through observability-driven development

By Shivaprakash Abburu

EY India Cybersecurity Consulting Partner

Cybersecurity optimist, Technology enthusiast

5 minute read 12 Oct 2023

Show resources

Observability-driven development (ODD) provides visibility and real-time user monitoring, enabling the optimization of application performance.

In brief

  • Observability-driven development (ODD) promotes early integration of observability, consolidating metrics, traces, and logs within a unified platform.
  • It helps in gathering and visualizing metrics, establishing alerts for potential issues, and obtaining insights into system performance.
  • It enhances app performance by providing complete visibility into real requests and code through distributed tracing.

Adapting to digital transformation in response to the pandemic and changing market needs is becoming increasingly essential for organizations that want to remain competitive. Staying up-to-date with the latest trends in technology is critical, and for many organizations, multi-cloud is fast becoming the de-facto standard. While cloud-native components offer many advantages, microservices can be complex to manage and operate.  

A significant challenge has been achieving real-time visibility of the underlying services to assess their health and identify performance bottlenecks. Additionally, maintaining a contextualized view of the data collected from cloud-native sources could potentially involve petabytes of data, which could lead to significant delays in analyzing potential threats, bottlenecks, or issues.

Moreover, modern application stacks have presented new challenges, such as log4shell, a software vulnerability that complicates the traditional "outside-in" perspective of application security. Instead, engineering teams are better positioned to adopt an "inside-out" view of application security by collecting signals that assist in identifying and preventing "pre-zero-day" attacks, thereby stopping newly discovered exploits before they can be enforced.

New software development techniques are now promising several benefits over the conventional approach. Two popular approaches are test-driven development (TDD) and behavior-driven development (BDD), both involve writing tests before writing code.

In TDD, developers write tests before writing code, as opposed to the usual practice of writing functional code and then testing it. On the other hand, BDD involves writing tests in frameworks capable of processing natural language instructions to describe a set of expected behaviors of a software system. TDD can accurately run the same test multiple times to ensure consistency. However, these tests are isolated and cannot reveal how the entire application will work or whether the customer experience will be good or bad. In other words, TDD doesn't provide insights into the application’s performance during its development. BDD, on the other hand, helps capture the expected behavior of any given application. However, it does not account for off-band events such as distributed denial-of-service, injection attacks, or any of the listed taxonomy of flaws in the OWASP Top 10 or the SANS Top 25 categories. 

The current challenges

While both TDD and BDD have their strengths, considering the scale and speed of modern systems, both techniques have some shortcomings in collecting and contextualizing a metrics-driven approach to developing and deploying resilient applications. When teams create a model, they assume that data is divided into two parts: signal and noise. The real pattern, the repeatable process that we hope to capture and describe, is the signal.

Everything else that impedes is referred to as noise. Engineers must move away from the traditional approach of simply monitoring well-understood infrastructure metrics and transition towards actively instrumenting code to be able to engage in a more constant “conversation” with production systems.

The site reliability engineering (SRE) Golden Signals that help and are relevant for resilient software development are: 

  • Latency: The time it takes to service a request which is equivalent to RED duration.
  • Traffic: The level of demand being placed on the system which is equivalent to RED Rate.
  • Errors: Tate of requests that fail which is equivalent to RED errors.
  • Saturation: Saturation is dependent on which resources are constrained and includes a forward-looking component. 

For the past two decades, IT teams have relied on Application Performance Management (APM) as the primary tool to monitor and troubleshoot applications and their networks. APM provides users with dashboards and alerts to troubleshoot an application’s performance in production. These insights are based on known or expected system failures, typically related to SRE golden signals, and provide engineers with alerts when pre-defined issues arise. But what about problems that develop unexpectedly? Today's software environments are increasingly distributed, with teams spread out geographically, creating, deploying, and maintaining programs.

Observability driven development 

Observability-driven development (ODD) is an approach that integrates observability best practices into the early stages of the software development lifecycle. In microservices, observability exposes the health of the production system, enabling developers to detect and fix performance issues. Microservices observability also provides visibility and real-time user monitoring to optimize application performance and availability.

Observability in software engineering plays a crucial role in proactively monitoring security. Data streams from various stages of development can be used to detect unusual activity and trigger actions to mitigate or block the impact of a security issue. Even if the workload is on the main platform and starts causing problems, observability can be used to initiate actions that limit or shut down the workload, replacing it with a known working variant if necessary. Engineers in upstream DevOps will also find observability valuable for overseeing outputs across different microservices and virtual containers to ensure these environments are ready for production as they progress through the DevOps line.  

Observability benefits

Observability can be scaled automatically. For example, by specifying instrumentation and data aggregation as part of a Kubernetes cluster configuration, you can gather telemetry from the moment it spins up until it spins down. A useful aspect of observability-driven development is tracking the performance of an application or platform over time. Changes can be detected, and off-target trends identified, triggering correction, or prompting human intervention. 

  • Observability-driven development (ODD) encourages a left-shift of the activities required for observability right from the early stages. Observability platforms monitor metrics, traces, and logs in one unified place.
  • Collects and visualizes metrics and sets up alerts for potential issues to gain insights into the performance and health of your systems.
  • Optimizes your application’s performance with end-to-end visibility into real requests and code through distributed tracing.
  • Cost-efficiently debugs, audits, and analyzes logs from all your services, applications, and platforms at scale.

System faults can be identified and resolved significantly faster, often within hours or even minutes, once ODD is implemented with the appropriate stack, instrumentation, and visualization. 

Welcome to "Gateway to data privacy and protection," a cutting-edge podcast series that delves deep into the realm of data privacy and protection.

Know more

According to a survey done by web tracking and analytics company New Relic

90%

of respondents believe that observability is important and strategic to their business.

94%

consider it important and strategic to their role.

Adopting ODD across various life cycle stages

Adopting ODD across various life cycle stages

Summary

Observability is not just a buzzword, but also an essential and practical technique for assessing the condition of overall infrastructure. Technologies like the cloud, containerization, and microservices have elevated system complexity to unprecedented levels. Complex systems depend on effective monitoring tools specifically designed for cloud environments. However, utilizing these tools does not guarantee observability. As observability is a holistic term encompassing the entire system. Ultimately, the observability best practice you select must be adaptable and scalable to evolve with your business. ODD has the potential to do for DevOps what TDD did for software development years ago.

About this article

By Shivaprakash Abburu

EY India Cybersecurity Consulting Partner

Cybersecurity optimist, Technology enthusiast