Love them, hate them, boycott them, hyperscalers still exist and are growing in popularity. The introduction of hyperscale storage solutions has created a whole new selection of options available to the public, with very little entry cost, naturally making them very appealing to organizations of all sizes.
Today we’re going to explore the big three (Microsoft Azure, Amazon Web Services and Google Cloud Platform), how their pricing models are similar and differ, and how to make the most of them when designing a backup strategy that incorporates any of them. These three aren’t the only object storage providers that Veeam integrate with, and most of the lessons within this blog will be applicable to any storage vendor.
This will be a three-part series in which we review what to be aware of with cloud storage, how to utilise this effectively within Veeam and compare benchmarks of real scenarios.
What Is Object Storage in the Cloud?
Before we dive too deep, let’s define what object storage is. At the heart of it, object storage is a data storage architecture that manages data as objects, which contains unique identifiers and metadata about the object. This differs from storage types such as block storage, which stores data within blocks, or file systems, which manage data in a file hierarchy format.
What is the appeal of using object storage? For one, these systems can be scaled infinitely – the ability to store trillions, if not more objects, removes limitations seen with previous storage models. This makes them ideal for large data sets such as photo or media libraries or large amounts of unstructured data.
Additionally, security features such as immutability are prevalent with most implementations. For example, this allows administrators to mark a file as read-only for specified periods. Combined with the fact that objects are highly durable, that is to say, the likelihood of corruption is extremely low, object storage tends to be a popular choice when it comes to secure and resilient storage.
How to Choose a Cloud Provider
When deciding which cloud suits you, there may be some organizational requirements or preferences, such as: Do you have a staff with public cloud training or experience? Does a particular cloud have a preferred data region or meet a specific regulatory requirement? Once you’ve got a shortlist of cloud providers you’re allowed to use, it’s time to review their costs.
Cloud Object Storage Criteria for Comparison
Cloud providers offer different pricing options such as storage tiers and costs for data writes and retrievals. It can be very tempting to just look at the cheapest, but all is not always as it first seems. Let’s jump in!
What’s a Storage Tier?
A storage tier is a collection of cloud resources designed to meet specific use cases based on generic customer needs. For example, Microsoft Azure’s “Hot” tier, Amazon Web Services’ “S3 Standard” tier and Google Cloud Platform’s “Standard” tier are considered “hot” tiers of storage. These are classed as “hot” because they’re designed for frequent data access. As a result, they have high levels of SLA, faster-performing storage and the data is readily accessible with lower-cost API calls (which we’ll talk about more shortly).
These aren’t the only tiers available though. The storage tiers available for each of these clouds are listed below from “hottest” to “coldest”:
- Microsoft Azure
- Hot Tier
- Cool Tier
- Archive Tier
- Amazon Web Services
- S3 Standard
- S3 Glacier
- S3 Deep Glacier
- Google Cloud Platform
As these storage tiers get colder, a few attributes change about them. The storage backing them can become lower performing and there may be delays between the requesting of the data and the availability of the data.
How to Determine API Call Types and Requirements
When interacting with storage, either via a read or a write, you’re actually facilitating API calls, fetching or placing a block at a time. So, how large is a block? Well, the answer is of course, it depends! Each storage provider will individually have a maximum block size supported; however, you’ll be guided by your configuration within Veeam. We’ll discuss this further along when we look at Veeam configurations, but by default, expect approximately 1MB to be 1 API call.
So, why is this important? Because API calls cost money! More so, the amount they cost depend on the storage tier you’re using (see previous section). The colder the storage, the more the API calls cost. These API calls are priced either per 10,000 calls (Azure/GCP) or per 1,000 calls (AWS).
Furthermore, when you decide to move data between tiers, this isn’t a “magic” or “free” operation. Each cloud provider handles this slightly differently, for example in Azure, you can demote data to a cooler tier and then promote back to a warmer tier, whereas in AWS & GCP this is referred to a lifecycle transition and data can only be migrated to colder tiers, not back to warmer tiers. Pricing is calculated for each as such:
- Microsoft Azure (Full details)
- Demoting Tier: Write Access (GB) and Write API calls are based on the costs of the destination tier.
- Promoting Tier: Read Access (GB) and Read API calls are based on the costs of the source tier.
- Amazon Web Services (Pricing details)
- Lifecycle Transitions: A transition price is defined for the destination tier you wish to demote data to.
- Google Cloud Platform (Pricing details)
- Lifecycle Transitions: These are billed at the Class A operations charge rate of the destination storage tier.
Now in the previous section, I mentioned that there are delays in retrieving archive tier data, this can be in the form of hours. This is commonly because the data has to be rehydrated from the archive storage. However, depending on the storage tier you utilise, it’s possible to create expedited/high-priority requests at a higher cost per API call and GB read to reduce the time delay to retrieve the first byte and beyond. This isn’t available on all platforms for all access tiers, so make sure that this option is available before you factor it into your recovery plans.
Minimum Data Retention
Before you go rushing ahead to calculate your storage costs to upload and retain the data, now we should discuss the restrictions on these tiers, particularly the colder ones. Microsoft, Amazon and Google all expect any data being uploaded to these tiers to be retained for minimum periods of time and there may be scenarios in which you believe placing your data onto a colder tier will save costs until you factor these in. The minimum retention periods for the different tiers are:
- Microsoft Azure (Full details)
- Hot Tier: No minimum retention
- Cool Tier: 30 days
- Archive Tier: 180 days
- Amazon Web Services (Full details)
- S3 Standard: No minimum retention
- S3 Glacier: 90 days
- S3 Deep Glacier: 180 days
- Google Cloud Platform (Full details)
- Standard: No minimum retention
- Nearline: 30 days
- Coldline: 90 days
- Archive: 365 days
This creates scenarios whereby additional charges can be incurred. For example:
- Deleting the data
- Migrating the data to another tier
- (Azure Only) Promoting the data
If any of these scenarios are carried out before the minimum retention periods are met, then charges will be levied, normally in the form of pro-rated storage retention for the remaining days. Review your specific provider and tier for more information via the links above.
As a final note on this subject, these vendors calculate data retention differently, for example GCP calculates data retention based on when the object was originally created within their storage platform, as opposed to when it was migrated to the tier requiring a minimum retention such as with Azure and AWS.
Data Distribution Options
Storage tiering isn’t the only design consideration you’ll require when planning your usage of cloud storage, you can also choose your level of redundancy for your data to withstand within your platform of choice.
You’ll be able to distribute your data across different data centers within the same region, commonly referred to as different availability zones. You’ll also be able to distribute your data between entirely different regions, to protect against a regional failure of your data. When protecting from regional failures, you’re not just protecting yourself from physical disaster to a location, but also access issues such as power or networking issues that isolate access to a particular region temporarily. Google Cloud Platform differs the most to Microsoft Azure and Amazon Web Services in this context, as the geo-redundancy you choose for your storage will either be specified by either choosing a region, or a name of a multi-region grouping. The options available are:
- Microsoft Azure (More details)
- Locally Redundant Storage (LRS) – This is the cheapest storage option available for Microsoft Azure, storing three copies of the data within a single zone within your selected region.
- Zone Redundant Storage (ZRS) – This storage will store your data within a single Azure region, but it will store the data three times, synchronously, between different Azure zones within the region.
- Geo-Redundant Storage – This will copy the data synchronously to a single zone within the primary region, identically to LRS and asynchronously again within a secondary region, identically to LRS.
- Geo-Zone-Redundant Storage – This will copy the data synchronously to three Azure zones within the primary region, identically to ZRS, whilst asynchronously storing the data again within a secondary region, identically to LRS.
- Amazon Web Services (More details)
- S3 Availability Zones – Unless you decide to specifically utilise a One Availability Zone redundancy class, your data will be split between three different Availability Zones within a region as standard. These are miles apart to prevent damage such as fire or flood from destroying all data within a region.
- S3 Cross-Region Replication – This asynchronous copying of data enables you to store multiple copies of data within different regions
- Google Cloud Platform (More details)
- Region – A single region in which your data will be stored.
- Dual-Region – A dual region pre-grouped by Google.
- Multi-Region – Two or more pre-grouped regions by Google.
Cloud Object Storage Pros and Cons
Advantages of Cloud Object Storage
- Scalability: Cloud object storage can be scaled up or down when you need it.
- Durability: Data stored on object storage is highly durable which removes worry about any long-term corruption
- Cost effective: When using a hyperscaler as a provider, you can pay for only what you require, without the need to setup your own infrastructure
- Accessibility: With proper access controls, you can add, share, and manage your data from any location with internet access
- Security: With features such as encryption, immutability, and access controls, you can keep your data safe.
Disadvantages of Cloud Object Storage
- Latency: Because you are storing your data in the cloud, you must ensure that you have an adequate connection to that data. It is not ideal for frequently accessed data that require low latency.
- Regulatory compliance: Although security is inherent in the platform, you still need to ensure that proper account controls are put in place while also adhering to any regulatory compliance requirements such as HIPAA or GDPR
- Cost: This can be an advantage, but understanding your workloads is key. Tasks such as data egress or frequent API requests can incur unwanted charges.
To recap in part one, we’ve looked at the “big three” object storage providers, where they offer similarities and where they differ. In part two, we’ll look at how making changes to Veeam will influence the impact of these providers and the associated costs.