Canceled or Failed Failback Results in CID Mismatch

KB ID: 2113
Products: Veeam Backup & Replication
Version: 8.x
Published:
Last Modified: 2016-03-21

Challenge

Using Veeam B&R, you fail over to a VMware replica, then select Failback to Production. After failback fails or is canceled, the original VM is in a ‘consolidation needed’ state and cannot be powered on or consolidated because of errors such as:
 
The parent virtual disk has been modified since the child was created
OR
Content ID mismatch

Cause

Replica failback transfers data from the replica to the original VM in two steps. In the first step, the replica is still running: a snapshot is created on the original VM, and data is transferred from the running replica VM to the working snapshot on the original VM. The replica is then shut down, and a second transfer begins. If this second data transfer fails, or if you click Cancel restore task at this time, the original VM will be put into a consolidation-needed state due to the mechanism of failback data transfer.
 
At certain times during failback, the disk descriptor files contain a CID mismatch; this mismatch is fixed at the end of a successful failback, but is not fixed when the task fails or is canceled at particular times. When the Veeam Backup service attempts to remove the Failback Working Snapshot on the original VM, ESXi is unable to completely process the snapshot removal request, removing the snapshot from the snapshot manager but not consolidating the child disks.


User-added image

This image indicates the point after which failback is most likely to create this issue, and the Cancel restore task option.
 

Solution

The design of replica failback assumes that the original VM is already corrupt, or too outdated to be of value. Testing of failback to the original location should only be performed with dedicated test VMs, not with any VMs running production workloads. If failback testing of production VMs is unavoidable, perform a backup before testing.
 
As a workaround, you can fail back to a new location instead of to the original VM, then delete the original VM. Note: this can take considerable time for a large VM or a slow network connection.
 
To resolve the ‘consolidation needed’ status you must first resolve the CID mismatch. VMware provides the following KB: Resolving the CID mismatch error: The parent virtual disk has been modified since the child was created. If you wish to continue trying to fail back to the original VM and want to keep the data already transferred in the hope of minimizing the amount of data to be transferred durring the next attempt, you can follow the steps in that KB. If keeping the transferred data is not important and the steps in that article do not work or are too complicated, try the alternative steps below.
 
Note that if the production VM is not already corrupt and you follow the steps in that article (to consolidate the snapshot), the partial data from failback will be written into the parent disks, corrupting the state of the VM. The state of the replica will still be good, so after successful failback the production VM’s state will also be good.
 
If you want to try to revert the orignal VM to its state prior to failback, you should perform a variation on the “Alternative procedure” from that article, described below. There is no guarantee that these steps will work, and your original VM may already be corrupt as a result of partial failover.
 

If the original VM had snapshots prior to failback:
Because of the CID mismatch, attempting to delete the snapshot from the snapshot manager will result in failure to consolidate that snapshot. No reliable method for recovering the data within the snapshot exists. Careful file editing as described in the VMware KB may allow you to resolve the CID mismatch prior to deleting the snapshot, but this is untested. You can still follow the steps below, but all data within the snapshot files will be lost.
 

If the original VM did not have any snapshots prior to failback:
 
Gather Information

  1. Edit the virtual machine configuration using a vSphere client.
  2. Note what disk files correlate to each SCSI ID, and the datastore location of each file.
          Example:
          [Datastore1] DC01\DC01-000001.vmdk     on SCSI0:0
          [Datastore1] DC01\DC01_1-000001.vmdk on SCSI0:1
          [Datastore2] DC01\DC01-000001.vmdk     on SCSI0:2
Detach Snapshot Disks and Attach Base Disks
  1. Edit the VM, and select each of the disks and click remove. The disk name will change to strikethrough text and show the word (removing). Do not select to delete the disk.
  2. After selecting all disks for removal, press OK.
  3. Edit the VM again, add a hard disk, choose to use an existing disk and then navigate to the location of the base disks for the replica. Repeat this step to attach each base disk (such as DC01.vmdk) to the same SCSI nodes that were noted earlier.
Datastore Cleanup
  1. Using the datastore browser, navigate to the VM’s folder.
  2. Most likely there will be many files, keep in mind that the only files that are required are:
  • VMX
  • VMXF
  • NVRAM
  • VMDK for each disk (technically consisting of VMDK descriptor and –flat.vmdk files, but the flat is not normally visible).
​       3. Delete all child disk files created by the snapshot (vmname-000###.vmdk), and any –ctk.vmdk files.  Also delete VMSN and VMSD files.
       4. It is not important to delete other files, such as log files.

Test the VM
  1. Create a snapshot.
  2. Open the snapshot manager and delete all.
  3. Power on the VM, or try failback again.
If at this point the CID mismatch is resolved, yet the original VM is corrupted, you should still be able to failback to the original location, because the failback process overwrites the state of the original VM.
 
Note: it may be necessary to resolve whatever issue caused failback to fail before you can try again.

 

1 / 5 (1 votes cast)

Report a typo on this page:

Please select a spelling error or a typo on this page with your mouse and press CTRL + Enter to report this mistake to us. Thank you!

Orphus system