Cloud DR: How can the public cloud help secure business continuity?

By Przemysław Jankowski

EY Poland, Technology Consulting, Manager

Cloud Solutions Architect and experienced team leader in global implementation projects. Audiophile and endurance sports enthusiast.

5 minute read 13 Mar 2023
Related topics Technology Consulting

Disaster Recovery (DR) is a key element of Business Continuity Plans (BCPs) by providing the organization with a way to proceed in the event of a failure or disaster. Organizations periodically analyze possible threats and the likelihood of materialization of scenarios affecting the risk of business continuity, and, among others, assess the risk of occurrence of events, assess consequence severity and ways of responding to threats. As a result, the BCP is updated to reflect the new conditions in the organization's environment that may affect its business continuity risk.

This article is part of the  #Drive2Cloud program

How geopolitical risk affects business continuity plans

One of the new conditions, which strongly resonates in the BCP plans in recent quarters, in particular for organizations operating in Eastern European countries, is the geopolitical situation related to the war in Ukraine. The vicinity of the  armed conflict area may result in organizations raising their risk perception for the physical security of data centers located in close geographical proximity to the conflict area or changing the level of risk of power outages in energy-intensive data centers, network problems or cyber-attacks.

DR solution in the cloud, i.e. cloud Disaster Recovery

Simply put, cloud Disaster Recovery technologically bases the process on cloud resources. Support for processes previously provided by a specific data center or centers is switched to a defined cloud environment/environments. For a physical user (for example, an organization employee, agent, or customer using the systems) or for a technical user (such as an application), switching to cloud environments can be seamless enough to go unnoticed. System data can be synchronized on an ongoing basis, including temporary data in the cache of the machines on which the processes operate. Infrastructure and applications can be up and running and ready to take over the load, and  switching network redirections from the affected environment to  cloud solution can be done in an automated or semi-automated way based on the mechanism of checking environment availability.

The upswing of cloud technology potential and the growing popularity of cloud services provide organizations with an additional tried and trusted tool that, compared to on-premise solutions, can be even more effective in addressing the risks identified in the Business Continuity Plan. The competitive edge of the public cloud over on-premise solutions for Disaster Recovery solutions is particularly evident in its ability to easily and relatively quickly change the geolocation of infrastructure and distribute it over many regions. At the same time, it should be pointed out that it is generally good practice to create regionally closed infrastructure. In solutions operating in many regions, the system should be replicated between them and operate independently. Unfortunately, no matter how perfect the method of data synchronization is, when the infrastructure is damaged, you always have to take into account some loss of data . The allowable amount of data loss (RPO, i.e. Recovery Point Objective) is defined by each organization individually and may vary depending on the system or application. As a result, it affects the configuration and tools or services used in DR.

Backup of cloud data

In the simplest model, the cloud can be a place for safe storage of data copies, which are saved in dedicated cloud services. Technically, the cloud can accept almost any data – from files with a directory structure, databases, images of systems and virtual machines, application data, disk images, code repositories, etc. Backup of cloud data can contain a complete picture of selected systems of the organization and the data on which these systems operate.

The data source can be physical data centers, data stored with other cloud providers, or the same cloud provider where the replicated data will be stored. Data replication can take place continuously, with ensuring the expected data consistency mode or processed at certain intervals for a lower cost. When determining cloud backup processes, it is important to consider the diverse characteristics and needs of the organization for the supported applications, systems and data warehouses. For some data specified by the organization as critical, such as  databases for applications that change  financial balance of clients, the organization can expect data replication in near real-time. "Near", because optical fiber reduces by about 1/3 the speed of light, which is a physical data carrier, and additionally, the correction for infrastructure imperfections and delays generated by edge components should be taken into account. In the case of complex solutions and requirements for which processing is expected in near-real-time processing, the above should be considered when determining the requirements for the solution, particularly for services provided and distributed globally.

Static copy of data most often means accepting the loss of the largest volume of data.

Data backup encryption

If we treat the cloud only as a repository for saving data backups, there are two high-level approaches to how to encrypt data:

  • the first approach assumes that an organization may choose to encrypt data using keys generated by the same public cloud service provider that also provides data warehouses where data backups will be stored.
  • the second approach is based on the separation of provider generating encryption keys from the provider of data backup services.

Distinguishing the approach by the place of generating encryption keys may be considered by organizations due to the issue of regulatory compliance 

By simplifying the analysis in this respect for a scenario where data is stored encrypted with a key external to the data warehouse provider. At the same time, it should be noted that the second approach may create additional complexity for the solution, affect the increase in RTOs and raise doubts on purely technological grounds in terms of a real increase in security compared to the first of the described approaches. 

Ways to replicate data to the cloud

Assuming that the organization has current BCP plans and specified requirements per system/application in terms of RPO and RTO, one of the first technical steps is to review the backup solutions used – as an “as-is” analysis of processes and tools. The organization most likely already uses third-party vendor solutions in its physical data centers to backup data and provide a technological platform for Disaster Recovery processes. At the same time, the Gartner report [1] published in July 2022 indicates a group of providers (of solutions for data backup and support for Disaster Recovery processes) classified as “market leaders”, who mostly participate in integrations with major distributors of public cloud solutions. In practice, an organization can continue to use a suite of applications from the same backup and Disaster Recovery provider, whilst extending the solution architecture to the public cloud. Obviously, following the earlier configuration of these tools and securing the required cloud-based resources basing on a properly configured so-called 'landing zone'.

In addition, cloud solution providers (CSPs) provide their own dedicated services for data backup and synchronization – both for data stored on-premises (“on-prem storage of data”) and data stored in third-party clouds.

At the same time, this is just one way to provide a copy of the data. Some repositories have their own native ways of replicating data, e.g. SQL databases can ensure continuous synchronization between the on-premise solution and the database made available in the cloud. Another example is user identity management directories that ensure change replication and consistency between different environments.

When creating a backup, especially an initial image, it is possible that the organization will identify extensive data sets. Transferring  terabytes or even petabytes of data over the Internet or private network to a cloud provider can be a costly and time-consuming task. In response to the above problems, some providers have introduced a physical media delivery service to their offer, which ensures that encrypted data is copied to disks in the customer’s physical data center, and then the disks are sent to the provider's server room and a copy is made in the cloud.

An additional, important decision point is the choice of the geographical location of the server room from which the services will be provided, regardless of whether the organization performs only data backup or covers broader Disaster Recovery processes. Public service providers provide information about available physical data centers, which enables the selection of geographic locations that meet the requirements imposed on them, including, for example, regulatory data processing only within the EEA, availability of specific services or delays generated in data transfer by connection distances. The solution can be based on a single cloud data center or shared across multiple regions, which will ensure higher availability and thus higher SLA standards.

In the case of data backup in the cloud, you can automate the way of creating images of applications and systems along with the data on which they work. The above images serve as building instructions and allow you to recreate  your systems in cloud environments or in physical server rooms. Cloud backup is a relatively inexpensive solution compared to  full cloud Disaster Recovery cloud. 

However, it should be taken into account that this process will be time-consuming – to restore the operational readiness of the systems or applications selected in the Business Continuity Plan, it is necessary to build resources from images.   

Cloud DR solution vs data backup

Unlike ordinary data backup, Disaster Recovery solutions offer the possibility of  almost immediate take over at least part of the workload in the event included in the Business Continuity Plan. At the same time, the DR solution in the cloud is associated with incurring a higher cost - in addition to the data backup itself, it requires additional continuously running cloud resources that are ready to take over the planned load as part of the Disaster Recovery process. At the same time, the larger the scale of continuously running DR solution resources against planned target size of the DR environment, the higher the RTO standard – the speed of restoring technical readiness to continue system and business processes.

There are various strategies for DR solutions in the cloud. These are based on determining the cost level and the intended level of the RTO/RPO standard. At the same time, the relationship between these two levels is in inverse proportion. Based on the BCP/DR plans, it can be determined which  DR strategies should apply to the selected systems of the organization. Such mapping assumes the analysis of IT processes and services in terms of their criticality for maintaining business continuity and ensuring security. As a consequence, different Disaster Recovery strategies can be applied to different systems.

Types of DR cloud strategies

The most expensive Disaster Recovery strategy, but at the same time giving the possibility of almost immediate takeover of processes on a full scale, is the 'Multi-site' strategy. It is based on an active/active approach, where the cloud environment operates at  scale to match the target workload planned for the DR solution and with close to zero data loss when an event occurs.

On the other extreme side, is the DR strategy with the lowest maintenance costs - the 'Backup & Restore' strategy. This is an 'active/passive' configuration – where the active data center serves as the primary, and the second data center is built as a Disaster Recovery facility, where specific actions are required to prepare it before taking over the target load as part of the DR process.

The compromise of the 'Backup & Restore' strategy is the acceptance of a long time needed to restore the full operability of the solution. Only a minimal part of the services is launched in the cloud and if a DR event occurs, it is necessary to almost completely restore the DR resources required for the DR solution in the cloud, which, despite the automation of the process, translates into a worse RTO. The recovery time depends on the size of scaling, the services selected, the complexity of the restored and scaled infrastructure, the availability of cloud resources, the provisions of the contract with the cloud provider, etc. It can also be assumed that along with the lower cost of the Disaster Recovery solution, the RPO may also deteriorate, forcing the organization to accept greater data loss or pressure to change the strategy or the way data replication is configured.

There are intermediate solutions between these strategies, which are considered as 'active/passive' configuration. The intermediate scenarios are usually implemented to ensure a balance between cost, speed of post-disaster or other incident recovery, and acceptable level of lost data

  • 'Pilot Light': only critical system resources are running continuously in the cloud. In the event of a Disaster Recovery event, scaling of resources to the planned level is initiated. It is assumed that scaling for 'Pilot Light' takes several tens of minutes.
  • 'Warm Standby': a fully functional environment is running that can take over some processes. However, scaling up to the planned level is still required. In this strategy, you can assume that scaling will be counted in minutes. 

Support for Disaster Recovery processes by cloud solutions

Both in the variant of secure data backup in the cloud and the full Disaster Recovery Cloud solution, it is necessary to:

  1. assess the risk of solutions,
  2. create a comprehensive architecture taking into account areas of  integration, data, applications or services, infrastructure and updates of corporate architecture of the organization,
  3. develop operational processes necessary to maintain the solution.

In addition to Disaster Recovery solution, an organization may also consider providing employees with operating system virtualization services along with virtual desktops and cloud applications. The above will ensure a secure working environment independent of physical company machines.

The subject of Disaster Recovery in the cloud can take on a diverse scale and catalogue of services used, depending on the needs of the organization. By addressing the subject of DR in the cloud, the organization opens up discussion and action on many aspects of cloud solutions, including, among others:

  • construction and configuration of the basic cloud environment (including, inter alia, basic architecture with the necessary services, network settings, settings for identity management, etc.),
  • communication channel between the organization's data centers and the cloud platform,
  • ensuring regulatory compliance for the use of the cloud solution by the organization,
  • testing the Disaster Recovery solution, e.g. switching to the DR solution, verifying the correctness of data backup, etc.

 

Cloud DR: jak chmura publiczna może pomóc w zabezpieczaniu ciągłości działania firmy? 

PL version

Summary

Configuring the Disaster Recovery Cloud solution is a task that requires a comprehensive approach that will ensure cooperation across various areas of the organization during the solution development process, as well as high awareness in the technological layer. By leveraging the strengths of cloud platforms – such as scalability, geographic distribution of data centers, security, automation, native services – organizations can effectively respond to the scenarios identified in business continuity plans. If an organization sees its existing Disaster Recovery solutions insufficient in the current business environment, it may be the right time to explore the possibility of incorporating cloud solutions.

Contact us

About this article

By Przemysław Jankowski

EY Poland, Technology Consulting, Manager

Cloud Solutions Architect and experienced team leader in global implementation projects. Audiophile and endurance sports enthusiast.

Related topics Technology Consulting