#1 Global Leader in Data Resilience

Deduplication Ratio Does Not Reflect Deduplicating Storage

KB ID: 2186
Product: Veeam Backup & Replication
Published: 2016-11-07
Last Modified: 2025-07-31
mailbox
Get weekly article updates
By subscribing, you are agreeing to have your personal information managed in accordance with the terms of Veeam's Privacy Notice.

Cheers for trusting us with the spot in your mailbox!

Now you’re less likely to miss what’s been brewing in our knowledge base with this weekly digest

error icon

Oops! Something went wrong.

Please, try again later.

Challenge

In the backup properties, the “Data Size” or “Backup Size” is larger than expected, or the “Deduplication” column in the backup statistics is different from the ratio reported by duplicating storage appliances.

Cause

Veeam Backup & Replication does not request information from storage appliances about the size of files as stored on the appliance. All values in this user interface would be the same if the data were written to non-deduplicating storage.

This limitation applies to all storage, whether integrated (HPE StoreOnce, EMC DataDomain, ExaGrid) or not.

Effect of Job Settings on Deduplication Ratio

Virtual disks are stored in each backup file as a combination of data blocks and tables of pointers to those blocks.

When inline data deduplication is enabled in the backup job settings, the deduplication table will contain many pointers to fewer blocks. For example, a full backup of a single virtual disk might contain ten thousand blocks; if many of these blocks are identical, the backup file would contain a table of ten thousand pointers to only a few thousand actual data blocks.

When inline data deduplication is disabled (such as when using the default settings for writing to a deduplication appliance), each entry in the table of blocks for a virtual disk either points to a data block or is ‘sparse’, representing a block containing no data. In the above example, a full backup of a 40 GB virtual disk containing 30 GB of used space becomes a backup file containing 10 GB of sparse blocks and 30 GB of actual blocks. Incremental backups of such a VM would usually not contain many data blocks containing zero data, because incremental backups do not read unchanged data. However, excluding deleted file blocks and VM guest files will result in sparse blocks being stored in an incremental backup file. In the above example, over 8 GB of the free space in the VM consists of deleted files or “dirty” blocks, so the incremental data size is 8.27 GB when deduplication is disabled. This occurs even though the zero blocks are not read during the incremental backup: in the example image, the VM was powered off, so no data (0.0 KB) was read from the VM disk during any incremental backup.

When inline data deduplication is enabled, these zero blocks are instead handled by the deduplication table so that they do not contribute to the deduplication ratio or the “Data Size” statistic. In the example image above, deduplication was enabled for the most recent incremental backup, so the “Data size” is negligible.

The deduplication ratio listed in the backup properties is the ratio of blocks in tables in the backup file to actual blocks stored in the file. Therefore, when the backup file contains many sparse blocks, the listed deduplication ratio will be very high or listed as 0.0x.
 

Solution

This user interface correctly reflects the backup file contents and works as designed.

More Information

This article's description of the backup file format is simplified for clarity.

“Backup Size” may be larger than the data stored within the VM with the default settings for storage appliances (4 MB blocks, no inline deduplication, decompress before storing), even after accounting for deleted or hidden files.  This is because of the large block size: a 4 MB block containing minimal data will still contribute 4 MB to the Backup Size. The storage appliance will deduplicate this space.
 
To submit feedback regarding this article, please click this link: Send Article Feedback
To report a typo on this page, highlight the typo with your mouse and press CTRL + Enter.

Spelling error in text

Thank you!

Thank you!

Your feedback has been received and will be reviewed.

Oops! Something went wrong.

Please, try again later.

You have selected too large block!

Please try select less.

KB Feedback/Suggestion

This form is only for KB Feedback/Suggestions, if you need help with the software open a support case

Veeam Backup & Replication
Veeam Data Cloud for Microsoft 365
Veeam Data Cloud for Microsoft Entra ID
Veeam Data Cloud for Salesforce
Veeam Data Cloud for Microsoft Azure
Veeam Data Cloud Vault
Veeam Backup for Microsoft 365
Veeam Backup for Microsoft Entra ID
Veeam Backup for Salesforce
Veeam ONE
Veeam Service Provider Console
Veeam Agent for Microsoft Windows
Veeam Agent for Linux
Veeam Backup for Nutanix AHV
Veeam Backup for AWS
Veeam Backup for Microsoft Azure
Veeam Backup for Google Cloud
Veeam Backup for Oracle Linux Virtualization Manager and Red Hat Virtualization
Veeam Management Pack for Microsoft System Center
Veeam Recovery Orchestrator
Veeam Agent for Mac
Veeam Agent for IBM AIX
Veeam Agent for Oracle Solaris
Veeam Cloud Connect
Veeam Kasten for Kubernetes
By submitting, you are agreeing to have your personal information managed in accordance with the terms of Veeam's Privacy Notice.
Verify your email to continue your product download
We've sent a verification code to:
  • Incorrect verification code. Please try again.
An email with a verification code was just sent to
Didn't receive the code? Click to resend in sec
Didn't receive the code? Click to resend
Thank you!

Thank you!

Your feedback has been received and will be reviewed.

error icon

Oops! Something went wrong.

Please, try again later.