Instant VM recovery of a VMware VM fails after at least 30 minutes with the error:
Failed to publish VM
Veeam Backup & Replication implements timeouts for most operations to protect against hangs. However, even when no process is hung, timeouts may occur due to significant performance problems or an unusual use case.
Typically this error occurs due to slow performance of the vPower NFS datastore. Possible causes of the slow performance:
- Slow repository read performance, especially due to deduplication storage "rehydrating" deduped/compressed backup data.
- Slow network link between host and vPower server due to congestion, setting of 100Mb/s on NIC, or other infrastructure issues.
- Poor performance of the vPower NFS server.
Increase the timeouts below to several times the default value. Although you can increase the timeouts beyond these limits, it is usually better to investigate performance first if you are performing a test or low-priority restore. In an emergency, increase the timeouts as high as necessary, or create a severity 1 case with Veeam Support for assistance in finding a workaround.
Open Regedit on the Veeam Backup Server and create the following values if they do not already exist. All values are created in key: HKLM\SOFTWARE\Veeam\Veeam Backup and Replication. All values are in decimal. Make sure no jobs or restore are running, then restart the Veeam Backup Service to apply these changes.
Default value: 900 (seconds)
Suggested value: 7200
Description: Applicable in all cases.
Default value: 30 (minutes)
Suggested value: 90
Description: Applies to mounting VMware virtual disks from backup or storage snapshot for instant VM recovery, except when restoring to vCloud Director.
Default value: 45 (minutes)
Suggested value: 120
Description: Applies to mounting VMware virtual disks from backup for instant VM recovery to vCloud Director.
Note: Additional considerations and timeout values may be applicable to restore from storage snapshots.
The most common cause of these errors is slow read performance from the backup repository. Deduplicating storage is not recommended as a back-end for vPower-based restores. Where possible, optimize storage devices for random read I/O of large blocks (typically 256 KB – 512KB with default settings, or 4 MB for backups on deduplicating storage; your use case may vary). A simple benchmark is described in KB2014.
As a workaround, or to verify that storage performance is the cause of the timeout, try temporarily moving the backup files to faster storage.
Where possible, make sure the vPower NFS server, the repository, and the destination ESXi host for the restore are all located at the same site. That is, avoid creating unnecessary bottlenecks by sending restore traffic over the WAN.
Additional performance troubleshooting:
- Depending on the underlying infrastructure, there can be significant performance differences between running the vPower NFS service from a VM or from a physical machine. For example, try using a VM located on the same ESXi host as the virtual lab.
- Heavily fragmented full backup files can reduce restore performance. Schedule compact operations to reduce fragmentation.
- Where applicable, test throughput of the network connections between the repository and vPower NFS server, and between the vPower NFS server and the ESXi host.
- Investigate CPU and memory usage of the repository and vPower servers.
Overview of vPower NFS Service
Configuring vPower server
The remotingTimeout setting affects all processes and services communicating with the Veeam Backup Service. In some cases, communication failures will be retried, so an operation may not fail until this timeout has occurred multiple times.
Consider that from a networking and vSphere configuration perspective there is little difference between vPower and any other NFS datastore.
VMware Technical Paper: Best Practices for Running vSphere on NFS Storage