Why Recovery Objectives Are Critical in Disaster Recovery
Recovery objectives are the foundational metric for building your disaster recovery strategy. Applying a quantifiable metric to the disruption that is tolerable to your business can help guide your evaluation of backup and recovery solutions to consider. Building your backup and recovery strategy based on your recovery objectives can provide you with confidence that when disaster hits, you are ready to recover with minimal data loss and impact on business processes and protect your business’s brand.
Why is understanding the difference between RPO and RTO critical for disaster recovery solutions?
Understanding the difference between RPO and RTO is critical in your planning for disaster. Knowing the maximum amount of time your business can tolerate being offline (RTO) and how much data loss is tolerable for business impact (RPO) can help shape your backup and recovery strategy and answer questions like what types of backups you should run for certain business-critical applications and how frequently those backups should take place, for example.
What is a recovery point objective?
A recovery point objective, or RPO, is the maximum amount of data that can be lost before it causes detrimental harm to an organization. The RPO indicates the data loss tolerance of a business process or an organization in general. This data loss is often measured in terms of time, for example, 5 hours or 2 days worth of data loss. A zero RPO means that no committed data should be lost when media loss occurs, while a 24 hour RPO can tolerate a day’s worth of data loss.
How do you calculate a recovery point objective?
There are five steps to consider when you calculate your recovery point objectives:
- Frequency of your file update: RPO needs to, at minimum, match the frequency that your files are updated. By doing so, the delta between new data and backup data will be minimal, reducing the risk of data loss.
- Align RPOs and Business Continuity Plans (BCP): Different parts of your business may require different RPOs based on the criticality of data. Highly-critical applications that require an “always-on” approach, will require more stringent RPOs while other applications or departments may not need the same recovery objective.
- Consider Industry Standards: The RPOs are dependant on business-critical applications. However, as a guideline, you can consider the industry standards for a particular industry.
- Zero to one hour: You use the shortest time frame for business-critical data or workloads, typically because they’re high volume, dynamic or difficult to recreate.
- One to four hours: Consider this time range for applications deemed semi-critical, where some data loss is acceptable.
- Four to 12 hours: A time frame of this length might get used for business units that update daily or less frequently.
- 13 to 24 hours: Setting longer RPO time frames for important, but not critical, data and business units rarely exceed 24 hours.
- Establish and approve each RPO: Once the RPOs are established, they must be approved by the IT department and stakeholders. Additionally, it is important to keep clear documentation as a baseline and records.
- Analyze your RPO settings consistently: It is wise to always evaluate and optimize your RPOs. When you test RPO and evaluate performance, you can make any adjustments as needed, providing even better protection for your data.
What is a recovery time objective?
A recovery time objective (RTO) is the maximum tolerable length of time that a computer, system, network or application can be down after a failure or disaster occurs. An RTO is measured in seconds, minutes, hours or days. It is an important consideration in a disaster recovery plan (DRP).
The amount of time that is used to determine the maximum a company can bear is directly linked to the application and its impact on the business; any loss of data affects revenue-generating activities. So, quantifying the impact of such losses will be a key factor in determining how to configure the environment to achieve the desired RTOs.
Calculation of risk
Both RPO and RTO are calculations of risk, providing measurements for how long a business can tolerate being offline from a disaster. As previously stated, these recovery objectives are often measured in seconds, minutes, hours or days. Even with taking the appropriate steps for calculating recovery objectives, the amount of risk is complex to quantify as it is unique to each application, dataset and company. Ultimately, it is important that ALL the stakeholders invested in the availability of your business’s applications and data agree on the quantity of risk associated with downtime. After all, there is typically a single IT organization servicing the business, and they will ultimately need to implement, manage and monitor the overall backup and recovery solution.
How to define RTO and RPO values for your applications
When defining your business’s RTO, consider:
- The cost per minute/hour/day of an outage
- Are there recovery SLAs in place with customers?
- Which applications or systems are a priority for being restored?
- What is the ideal order in which critical applications need to be recovered?
When defining your business’s RPO, consider:
- How much data, if any, can you stand to lose?
- What are the potential financial implications?
- What are the potential legal implications?
- How does data loss affect your brand?
Testing RPO and RTO
How can you have the confidence to meet objectives if you don’t regularly test your plan? While there are many best practices for testing recovery objectives, the most important practice is to actually perform the testing. This does not come easy or cheap in many cases considering the amount of time and storage potentially required to complete the testing. Some things to consider when planning recovery testing are:
- The best testing schedule to meet SLA requirements
- The time required by your solution to recover the data or workload to an operational state
- The storage requirements for data recovery, storage and compute requirements for workloads
- Automation to ensure repetition and reduce errors
Ongoing monitoring and analytics
As with any IT solution, ongoing monitoring and analytics help to ensure that the infrastructure and solution are functioning as designed and without failure. Nothing is more important than ensuring you can recover your business’s data. To increase backup success, which leads to reliable recovery, consider adding the following to your process:
- 24/7/265 monitoring to ensure that backups are completed with no errors
- Backup infrastructure monitoring for common issues that could affect backup success
- Analysis of usage trends to prevent future issues with backup storage capacity
Recovery objectives are the foundation of your disaster recovery strategy and are critical to align to your SLAs. Ready to dive deeper? Watch this recorded webinar to learn more about how to reliably achieve your recovery objectives with Veeam by:
- Aligning your objectives with supercharged backups and instant recovery
- Avoiding RPO and RTO violations with automatically scheduled tests
- Keeping backups safe from cyberthreats and avoiding reinfection
Watch the recorded webinar here.