During the snapshot removal step of a Veeam job, the source VM loses connectivity temporarily.
Veeam does not remove the snapshot itself, Veeam sends an API call to VMware to have the action performed.
The snapshot removal process significantly lowers the total IOPS that can be delivered by the VM because of additional locks on the VMFS storage due to the increase in metadata updates, as well as the added IOP load of the snapshot removal process itself. In most environments, if you are already over 30-40% IOP load for your target storage, which is not uncommon with a busy SQL/Exchange server, then the snapshot removal process will easily push that into the 80%+ mark and likely much higher. Most storage arrays will see a significant latency penalty once IOP's get into the 80%+ mark which will of course be detrimental to application performance.
The following test should be performed when connectivity to the VM is not sensitive, for instance, during off-peak hours.
To isolate the VMware snapshot removal event, Veeam suggests the following isolation test:
While performing the test above, if you observe the same connectivity issues as during the Veeam job run, the issue likely exists within the VMware environment itself. Review the following list of troubleshooting steps and known issues. If none of the following work to resolve the issue, we advise that you contact VMware support directly regarding the snapshot removal issue.
This issue will present as multiple minutes worth of stun. Normal snapshot stun is only mere seconds.
There is a known issue with NFS 3.0 Datastores and Virtual Appliance (HOTADD) transport mode. The issue is documented in this VMware KB article: http://kb.vmware.com/kb/2010953. "This issue occurs when the target virtual machine and the backup appliance [proxy] reside on two different hosts, and the NFSv3 protocol is used to mount NFS datastores. A limitation in the NFSv3 locking method causes a lock timeout, which pauses the virtual machine being backed up [during snapshot removal]."
If this issue occurs, you should implement one of three following solutions:
In the Direct NFS access mode, Veeam Backup & Replication bypasses the ESXi host and reads/writes data directly from/to NFS datastores. To do this, Veeam Backup & Replication deploys its native NFS client on the backup proxy and uses it for VM data transport. VM data still travels over LAN but there is no load on the ESXi host.
More details are available here.
Veeam automatically scans for registry values every 15 minutes. Wait 15 minutes for the value to take effect, or stop all jobs and reboot to force the value to be checked.