by tsightler » Sat Nov 14, 2009 5:36 am
[Updated by Gostev on Nov 22th]
It has come to my attention that competition is trying to spread FUD and make big deal over this issue, while in fact it is not a big deal.
1. Backups are NOT corrupted.
2. You can only run into this issue with NON-DEFAULT restore mode, in 1 restore mode of 3 existing modes.
3. Despite what competition may be claiming, there is no actual user data loss or corruption - VM will still boot and work.
The only real issue is OS and file system check tools complaining about unexpected content of the unused disk blocks. Linux ext3 file system and disk test tools merely suspect a problem seeing unused blocks being non-zeroed, and warn about this. This is specific to certain file systems only, for example, Windows NTFS considers this situation absolutely normal.
[Updated by Gostev on Nov 18th]
Issue summary:
What IS NOT affected
1. Actual backups are not corrupted.
2. Guest OS file level restore is not affected.
3. VM file level restore (VMX, VMDK) is not affected.
4. Entire VM restore with registration for Windows VMs is not affected.
5. Entire VM restore for Linux VMs in the default (agentless) restore mode is not affected.
What IS affected
Entire VM restore for Linux VMs in agent-based mode (used for ESX hosts for which you have purposely enabled service console agent-based operations in the host settings) is affected in the following way:
• All VMDK blocks containing actual data are restored properly (there is no data corruption/loss).
• All VMDK blocks without data are not zeroed and will contain data that was previously stored in the corresponding VMFS blocks. Thus, issue is not reproducible when restoring to "clean" VMFS datastore.
• VM boots up and runs fine, but OS may complain about file system integrity issues.
• Disk check tools like fsck may complain about file system integrity issues if forced to check the whole disk.
Cause
Unlike Windows NTFS, some Linux file systems and disk test tools expect the unused disk blocks to be zeroed, while treating and reporting non-zeroed disk blocks as potential disk data corruption issue.
Fix for Veeam Backup 4.0 is available through support. The fix is included in Veeam Backup 4.1 (scheduled for release in Dec 09).
Original post:
OK, I've got a MAJOR issue. Last week we had to restore a couple of RHEL5 VM's. The process seemed to go OK, the systems restored and booted without a problem and the machines seemed perfectly fine. Today, an administrator of one of the systems started getting filesystem errors and reported them to me. At first it looked like some minor corruption so I rebooted to a rescue CD to run an 'fsck' on the filesystems. This was disasterous. Each and every filesystem reported tons and tons of corruption, so bad that it was uncorrectable.
Because this was a development system on which we were doing testing with Oracle cluster services I didn't think too much of it. They had been through panic reboots and were running Oracle modules that weren't supported by Redhat. That being said, it did cause me some concern so I decided to preform a restore of one of our smaller, mostly inactive linux system and run an 'fsck' on the restored volume. Guess what? It showed the massive corruption as well. It appears that something that is part of the Veeam restore process is causing subtle corruption of the restored VMDK.
This is a critical issue since, as cool as Veeam is, it's most critical function is to correctly restore data. I have not tested Windows systems yet. I'm planning to perform some additional testing on a small test system and to open a support case but I'm putting this out there to see if anyone else has experienced any problems with completely restored VM's.