1-800-691-1991 | 9am - 8pm ET
EN

VM Loses Connection During Snapshot Removal

KB ID: 1681
Product: Veeam Backup & Replication 11, Veeam Backup & Replication 10, Veeam Backup & Replication 9.5
Published: 2012-10-04
Last Modified: 2021-08-04
Languages: DE | FR | ES

Challenge

During the snapshot removal step of a Veeam job, the source VM loses connectivity temporarily.

Cause

Veeam does not remove the snapshot itself, Veeam sends an API call to VMware to have the action performed.

The snapshot removal process significantly lowers the total IOPS that can be delivered by the VM because of additional locks on the VMFS storage due to the increase in metadata updates, as well as the added IOP load of the snapshot removal process itself. In most environments, if you are already over 30-40% IOP load for your target storage, which is not uncommon with a busy SQL/Exchange server, then the snapshot removal process will easily push that into the 80%+ mark and likely much higher. Most storage arrays will see a significant latency penalty once IOP's get into the 80%+ mark which will of course be detrimental to application performance.

Isolation Testing

The following test should be performed when connectivity to the VM is not sensitive, for instance, during off-peak hours.

To isolate the  VMware snapshot removal event, Veeam suggests the following isolation test:

  1. Create a snapshot on the VM in question.
  2. Leave the snapshot on the VM for the duration of time that a Veeam job runs against that VM.
  3. Remove the snapshot.
  4. Observe the VM during the snapshot removal.

While performing the test above, if you observe the same connectivity issues as during the Veeam job run, the issue likely exists within the VMware environment itself. Review the following list of troubleshooting steps and known issues. If none of the following work to resolve the issue, we advise that you contact VMware support directly regarding the snapshot removal issue.

Snapshot Stun Troubleshooting

  • If the environment is using NFS 3.0 Datastores, see the section below about a Known Issue with NFS 3.0 and Hotadd.
  • Check for snapshots on the VM while no Veeam job is running and remove any that are found.

    Veeam Backup & Replication is able to back up a VM that has snapshots present, however, it has been observed that when VMware attempts to remove the snapshot created during a Veeam job operation and there is a snapshot already present on the VM, snapshot stun may occur.
  • Check for orphaned snapshots on the VM. (See: http://kb.vmware.com/kb/1005049)
  • Reduce the number of concurrent tasks that are occurring within Veeam, this will in turn reduce the number of active snapshot tasks on the datastores.
  • Move VM to a datastore with more available IOPS, or split the disks of the VM up into multiple datastores to more evenly spread the load.
  • If the VM's CPU resources spike heavily during Snapshot consolidation, consider increasing the CPU reservation for that VM.
  • Ensure you are on the latest build of your current version of vSphere, hypervisors, VMware Tools, and SAN firmware when applicable.
  • Move VM to a host with more available resources.
  • If possible, change the time of day that the VM gets backed up or replicated to a time when the least storage activity occurs.
  • Use a workingDir to redirect Snapshots to a different datastore than the one the VM resides on. http://kb.vmware.com/kb/1002929

Known Issue with NFS 3.0 Datastores

This issue will present as multiple minutes worth of stun. Normal snapshot stun is only mere seconds.

There is a known issue with NFS 3.0 Datastores and Virtual Appliance (HOTADD) transport mode. The issue is documented in this VMware KB article: http://kb.vmware.com/kb/2010953. "This issue occurs when the target virtual machine and the backup appliance [proxy] reside on two different hosts, and the NFSv3 protocol is used to mount NFS datastores. A limitation in the NFSv3 locking method causes a lock timeout, which pauses the virtual machine being backed up [during snapshot removal]."

If this issue occurs, you should implement one of three following solutions:

Use Direct NFS Mode (Best Performance Option)

In the Direct NFS access mode, Veeam Backup & Replication bypasses the ESXi host and reads/writes data directly from/to NFS datastores. To do this, Veeam Backup & Replication deploys its native NFS client on the backup proxy and uses it for VM data transport. VM data still travels over LAN but there is no load on the ESXi host.

More details are available here.

Configuration Tips:

  • Backup Proxy must be able to reach the NFS storage backing the Production Datastores.
    If the proxy is a VM, this may require creating a VM Port Group on the vSwitch where the NFS storage is connected.
  • Backup Proxy's IP address must be on the NFS export whitelist.
  • Backup Proxy's Managed Server entry must be rescanned to make Veeam Backup & Replication aware of the NFS access.
Force Veeam Backup & Replication to use the Hotadd Proxy on the same host as the VM
  1. Create a Backup Proxy on every host in the VMware cluster where backups occur
  2. Create the following registry value on the Veeam Backup & Replication server.

    Key Location:
    HKLM\Software\Veeam\Veeam Backup and Replication\
    Value Name: EnableSameHostHotaddMode
    Value Type: DWORD (32-Bit) Value
    Value Data:


    For the value data there are two options, 1 or 2. Both values 1 or 2 will enable a feature which forces Veeam Backup & Replication to first attempt to use, and wait for, the Proxy that is on the same host as the VM to be backed up. The difference between the two is as follows:
    - If proxy on same host as VM becomes unavailable, Veeam Backup & Replication will fail over to any available proxy and use the available transport mode. This may lead to a situation where a proxy on another host is selected, and hotadd is used, which may cause stun. This ensures highest performance, but may risk VM stun.

    2 - If proxy on same host as VM becomes unavailable, Veeam Backup & Replication will use any available proxy, but use network transport mode. This minimizes all stun risk, but may lead to reduced backup performance when forced to use Network transport mode.


    Veeam automatically scans for registry values every 15 minutes. Wait 15 minutes for the value to take effect, or stop all jobs and reboot to force the value to be checked.

Force Backup Proxies to use Network Transport mode.
  1. Edit the proxies listed under [Backup Infrastructure]>[Backups Proxies]
  2. Click the [Choose] button next to “Transport mode”
  3. Select the radio option for “Network” mode
  4. Click [OK] to close the prompt and then [Finish] to commit the change.

More information

For more information about "Snapshot Stun" please review this VMware KB artcile, Snapshot removal stops a virtual machine for long time (1002836)

KB ID: 1681
Product: Veeam Backup & Replication 11, Veeam Backup & Replication 10, Veeam Backup & Replication 9.5
Published: 2012-10-04
Last Modified: 2021-08-04
Languages: DE | FR | ES

Couldn't find what you were looking for?

Below you can submit an idea for a new knowledge base article.
Report a typo on this page:

Please select a spelling error or a typo on this page with your mouse and press CTRL + Enter to report this mistake to us. Thank you!

Spelling error in text

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your report was sent to the responsible team. Our representative will contact you by email you provided.

Oops! Something went wrong.

Please try again later.

KB Feedback/Suggestion

By submitting, you are agreeing to have your personal information managed in accordance with the terms of Veeam's Privacy Policy.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Thank you for your interest in Veeam products!
We've sent a verification code to:
  • Incorrect verification code. Please try again.
An email with a verification code was just sent to
Didn't receive the code? Click to resend in sec
Didn't receive the code? Click to resend

ty icon

Thank you!

We have received your request and our team will reach out to you shortly.

OK

error icon

Oops! Something went wrong.

Please try again later.