5 minute read 12 Oct 2023

Resilient software delivery through observability-driven development

5 minute read 12 Oct 2023

Upvote

Show resources

India DPDP Act 2023

Download 1 MB

Observability-driven development (ODD) provides visibility and real-time user monitoring, enabling the optimization of application performance.

Adapting to digital transformation in response to the pandemic and changing market needs is becoming increasingly essential for organizations that want to remain competitive. Staying up-to-date with the latest trends in technology is critical, and for many organizations, multi-cloud is fast becoming the de-facto standard. While cloud-native components offer many advantages, microservices can be complex to manage and operate.

A significant challenge has been achieving real-time visibility of the underlying services to assess their health and identify performance bottlenecks. Additionally, maintaining a contextualized view of the data collected from cloud-native sources could potentially involve petabytes of data, which could lead to significant delays in analyzing potential threats, bottlenecks, or issues.

Moreover, modern application stacks have presented new challenges, such as log4shell, a software vulnerability that complicates the traditional "outside-in" perspective of application security. Instead, engineering teams are better positioned to adopt an "inside-out" view of application security by collecting signals that assist in identifying and preventing "pre-zero-day" attacks, thereby stopping newly discovered exploits before they can be enforced.

New software development techniques are now promising several benefits over the conventional approach. Two popular approaches are test-driven development (TDD) and behavior-driven development (BDD), both involve writing tests before writing code.

In TDD, developers write tests before writing code, as opposed to the usual practice of writing functional code and then testing it. On the other hand, BDD involves writing tests in frameworks capable of processing natural language instructions to describe a set of expected behaviors of a software system. TDD can accurately run the same test multiple times to ensure consistency. However, these tests are isolated and cannot reveal how the entire application will work or whether the customer experience will be good or bad. In other words, TDD doesn't provide insights into the application’s performance during its development. BDD, on the other hand, helps capture the expected behavior of any given application. However, it does not account for off-band events such as distributed denial-of-service, injection attacks, or any of the listed taxonomy of flaws in the OWASP Top 10 or the SANS Top 25 categories.

The current challenges

While both TDD and BDD have their strengths, considering the scale and speed of modern systems, both techniques have some shortcomings in collecting and contextualizing a metrics-driven approach to developing and deploying resilient applications. When teams create a model, they assume that data is divided into two parts: signal and noise. The real pattern, the repeatable process that we hope to capture and describe, is the signal.

Everything else that impedes is referred to as noise. Engineers must move away from the traditional approach of simply monitoring well-understood infrastructure metrics and transition towards actively instrumenting code to be able to engage in a more constant “conversation” with production systems.

The site reliability engineering (SRE) Golden Signals that help and are relevant for resilient software development are:

Latency: The time it takes to service a request which is equivalent to RED duration.
Traffic: The level of demand being placed on the system which is equivalent to RED Rate.
Errors: Tate of requests that fail which is equivalent to RED errors.
Saturation: Saturation is dependent on which resources are constrained and includes a forward-looking component.

For the past two decades, IT teams have relied on Application Performance Management (APM) as the primary tool to monitor and troubleshoot applications and their networks. APM provides users with dashboards and alerts to troubleshoot an application’s performance in production. These insights are based on known or expected system failures, typically related to SRE golden signals, and provide engineers with alerts when pre-defined issues arise. But what about problems that develop unexpectedly? Today's software environments are increasingly distributed, with teams spread out geographically, creating, deploying, and maintaining programs.

Observability driven development

Observability-driven development (ODD) is an approach that integrates observability best practices into the early stages of the software development lifecycle. In microservices, observability exposes the health of the production system, enabling developers to detect and fix performance issues. Microservices observability also provides visibility and real-time user monitoring to optimize application performance and availability.

Observability in software engineering plays a crucial role in proactively monitoring security. Data streams from various stages of development can be used to detect unusual activity and trigger actions to mitigate or block the impact of a security issue. Even if the workload is on the main platform and starts causing problems, observability can be used to initiate actions that limit or shut down the workload, replacing it with a known working variant if necessary. Engineers in upstream DevOps will also find observability valuable for overseeing outputs across different microservices and virtual containers to ensure these environments are ready for production as they progress through the DevOps line.

Observability benefits

Observability can be scaled automatically. For example, by specifying instrumentation and data aggregation as part of a Kubernetes cluster configuration, you can gather telemetry from the moment it spins up until it spins down. A useful aspect of observability-driven development is tracking the performance of an application or platform over time. Changes can be detected, and off-target trends identified, triggering correction, or prompting human intervention.

Observability-driven development (ODD) encourages a left-shift of the activities required for observability right from the early stages. Observability platforms monitor metrics, traces, and logs in one unified place.
Collects and visualizes metrics and sets up alerts for potential issues to gain insights into the performance and health of your systems.
Optimizes your application’s performance with end-to-end visibility into real requests and code through distributed tracing.
Cost-efficiently debugs, audits, and analyzes logs from all your services, applications, and platforms at scale.

System faults can be identified and resolved significantly faster, often within hours or even minutes, once ODD is implemented with the appropriate stack, instrumentation, and visualization.

Welcome to "Gateway to data privacy and protection," a cutting-edge podcast series that delves deep into the realm of data privacy and protection.

**Know more**

According to a survey done by web tracking and analytics company New Relic

90%

of respondents believe that observability is important and strategic to their business.

94%

consider it important and strategic to their role.

Adopting ODD across various life cycle stages

How EY can help

Cybersecurity, strategy, risk, compliance and resilience

EY Cybersecurity, strategy, risk, compliance and resilience teams can provide organizations with a clear picture of their current cyber risk posture and capabilities, giving them an informed view of how, where and why to invest in managing their cyber risks.

Next generation security operations and response

Our Next generation security operations and response services along with a deep portfolio of consulting, implementation and managed services, can help organizations build a transformation strategy and roadmap to implement the next generation of security operations.

Data protection and privacy

EY data protection and privacy services help organizations stay up-to-date with leading services in data security and data privacy, as well as complying with regulation in a constantly evolving threat environment and regulatory landscape.

Summary

Observability is not just a buzzword, but also an essential and practical technique for assessing the condition of overall infrastructure. Technologies like the cloud, containerization, and microservices have elevated system complexity to unprecedented levels. Complex systems depend on effective monitoring tools specifically designed for cloud environments. However, utilizing these tools does not guarantee observability. As observability is a holistic term encompassing the entire system. Ultimately, the observability best practice you select must be adaptable and scalable to evolve with your business. ODD has the potential to do for DevOps what TDD did for software development years ago.

About this article

Shivaprakash Abburu

By Shivaprakash Abburu

EY India Cybersecurity Consulting Partner

Cybersecurity optimist, Technology enthusiast

Related topics Cybersecurity Technology Technology leader's agenda Digital

Upvote

EY refers to the global organization, and may refer to one or more, of the member firms of Ernst & Young Global Limited, each of which is a separate legal entity. Ernst & Young Global Limited, a UK company limited by guarantee, does not provide services to clients.

EY | Assurance | Consulting | Strategy and Transactions | Tax

About EY

EY is a global leader in assurance, consulting, strategy and transactions, and tax services. The insights and quality services we deliver help build trust and confidence in the capital markets and in economies the world over. We develop outstanding leaders who team to deliver on our promises to all of our stakeholders. In so doing, we play a critical role in building a better working world for our people, for our clients and for our communities.

EYG/OC/FEA no.

ED MMYY

This material has been prepared for general informational purposes only and is not intended to be relied upon as accounting, tax, or other professional advice. Please refer to your advisors for specific advice.

Topics

General

People

Resilient software delivery through observability-driven development

Show resources

India DPDP Act 2023

Observability-driven development (ODD) provides visibility and real-time user monitoring, enabling the optimization of application performance.

Adopting ODD across various life cycle stages

How EY can help

Cybersecurity, strategy, risk, compliance and resilience

Next generation security operations and response

Data protection and privacy

Related articles

Summary

Get our latest newsletter direct to your inbox

Editor’s Picks

Welcome to EY.com

Topics

General

People

Trending

Resilient software delivery through observability-driven development

Show resources

India DPDP Act 2023

Observability-driven development (ODD) provides visibility and real-time user monitoring, enabling the optimization of application performance.

Adopting ODD across various life cycle stages

How EY can help

Cybersecurity, strategy, risk, compliance and resilience

Next generation security operations and response

Data protection and privacy

Related articles

Summary

Get our latest newsletter direct to your inbox

Editor’s Picks