Downtime is not an option for modern organizations that must fulfill their customers’ needs and expectations. Different types of incidents can occur and impact your business revenue or even existence. Whether it’s a ransomware attack, a power outage, flood or simply human mistakes, these events are unpredictable, and the best thing you can do is to BE PREPARED.
Preparedness means that you should have a solid business continuity and disaster recovery (BCDR) plan. One that has been tested and that can be put in motion smoothly.
Two of the important parameters that define a BCDR plan are the Recovery Point Objective (RPO) and Recovery Time Objective (RTO). For those of you who are not familiar with these terms, let me give you a brief description:
- RPO limits how far to roll back in time, and defines the maximum allowable amount of lost data measured in time from a failure occurrence to the last valid backup.
- RTO is related to downtime and represents how long it takes to restore from the incident until normal operations are available to users
While RPO and RTO may sound similar, they serve different purposes and, in an ideal world, their values would be as close to zero as possible. However, back to our world, the cost for zero RPO and RTO would be extremely expensive and might not be worth the effort.
Let’s take a closer look at recovery objectives. RPO is about how much data you afford to lose before it impacts business operations. For example, for a banking system, 1 hour of data loss can be catastrophic as they operate live transactions. At a personal level, you can also think about RPO as the moment you saved a document you are working on for the last time. In case your system crashes and your progress is lost, how much of your work are you willing to lose before it affects you?
On the other hand, RTO is the timeframe within which application and systems must be restored after an outage. It’s a good practice to measure the RTO starting with the moment the outage occurs, instead of the moment when the IT team starts to fix the issue. This is a more realistic approach as it represents the exact point when the users start to be impacted.
How to define RTO and RPO values for your applications
The truth is there is no one-size-fits-all solution for a business continuity plan and its metrics. Companies are different from one vertical to another, have different needs, and therefore they have different requirements for their recovery objectives. However, a common practice is to divide applications and services into different tiers and set recovery time and point objective (RTPO) values according to the service-level agreements (SLAs) the organization committed to.
Data protection classification is important to determine how to store, access, protect, recover and update data and information more efficient based on their specific criteria. It is essential to analyze your applications and determine which of them are driving your business, generating revenue and are imperative to stay operational. This process that is essential for a good business continuity plan is called business impact analysis (BIA), and it establishes protocols and actions for facing a disaster.
For example, you can use a three-tier model to design your business continuity plan:
- Tier-1: Mission-critical applications that require an RTPO of less than 15 minutes
- Tier-2: Business-critical applications that require RTO of 2 hours and RPO of 4 hours
- Tier-3: Non-critical applications that require RTO of 4 hours and RPO of 24 hours
It’s important to keep in mind that mission-critical, business-critical and non-critical applications vary across industries and each organization defines these tiers based on their operations and requirements.
Now that you have ranked your applications and services and you know what the impact will be in case of specific incidents, it's time to find a solution that can help you protect your business data and operations. Veeam Availability Platform is a complete set of tools designed to achieve stringent recovery objectives for virtual, physical and cloud-based workloads.
How do RTO and RPO work in practice
Quick application-item recovery
A sales representative deleted an e-mail which needs to be sent to a customer ASAP. Microsoft Exchange is an example of Tier-2 applications. Since the IT administrator schedules backup jobs throughout the day, the company can definitely achieve the RPO of 1 hour. With Veeam Explorer for Microsoft Exchange, which is part of all versions of Veeam Backup & Replication, it’s very easy to recover an individual email item within minutes or even seconds, saving time and resources on staging or restoring an entire application server VM!
Instant recovery of an entire virtualized server directly from a backup
Let’s imagine a bank that operates several ATM machines. The ATM system is business-critical for the bank’s operations (Tier-2), but if it crashes for few hours it will impact the bank transactions, not the whole bank integrity. With the use of Veeam Backup & Replication and the Instant VM Recovery feature, you can immediately startup the virtualized ATM server from the deduplicated and compressed Veeam backup file. This will result in an RTO of a few minutes! Moreover, with the use of hypervisor migration functionalities and Veeam Quick Migration you can easily migrate the running VM from the backup datastore to the production datastore to complete the recovery process.
Maintenance workers caused an electrical failure in one of your data centers resulting in a full-site failure and loss of access to all your Tier-1 applications. Let’s say, you use Veeam to replicate all your critical VMs off site twice a day. This makes you able to easily achieve the defined RPO of minutes. From an RTO perspective, Veeam enables you to easily recover in case of major incidents with several built-in features: one-click failover, assisted failback, Re-IP to match the network in the DR site, and true cloud-based disaster recovery.
Switching from your virtual infrastructure to the physical world, Veeam also provides backup and recovery solutions for your laptops or desktops. With Veeam Agent for Microsoft Windows, you can restore files from your Recovery Media to your Windows-based computer or even power your PC backup image to a virtual machine to achieve low RPOs.
Furthermore, with Veeam Agent for Linux, you can protect your Linux workloads, whether they are running on-premises or in the public cloud.
Nobody can predict a disaster, however, you can act organized following your business continuity plan when facing such an incident. RPO and RTO values may vary across different companies, but at all times they will be a compromise between business needs for Availability and required investments in IT. Their estimation should be a result of a deliberation between your organization’s business and IT experts. But what goes beyond any deliberations is an implementation of a reliable Availability solution for virtual, physical and cloud workloads to ensure Always-On operations for your business.
- How to build a disaster recovery plan and enable business continuity with Veeam
- Cloud disaster recovery models: planning and testing
- 3 Essential characteristics of managing planned downtime
- Cisco HyperFlex Systems and Veeam