> 
 > 
Download and read more

Top 7 VMware Management Challenges

Published: October, 2011
Author: Eric Siebert, VMware vExpert

Server virtualization introduces much efficiency into the data center but it also introduces challenges as well. Virtualization is both an architectural and fundamental change compared to traditional server environments and affects everything in the data center. The challenges that come with virtualization must be addressed if you want to successfully implement it. If you ignore them, you risk having your virtualization project perceived as a failure and becoming a management nightmare. In this white paper we will cover the top seven management challenges that you must deal with when implementing VMware virtualization.

VM Sprawl

VM sprawl is to virtual environments as urban sprawl is to cities, in the way a virtual environment contributes to both uncontrolled growth of virtual machines and the over-allocation of resources.

Virtualization drastically changes the way we provision servers. In a physical server environment we follow a typical process to provision a new server:

  • 1. Determine server hardware specifications and requirements
  • 2. Fill out a purchase requisition
  • 3. Provide justification to get approval
  • 4. Order server from hardware vendor
  • 5. Wait for server to arrive
  • 6. Unpack server, assemble hardware and install in rack
  • 7. Connect it to networks and storage devices

This whole process from start to finish can take weeks and cost thousands of dollars.

In a virtual environment the process to provision a new VM is much simpler:

  • 1. Tell the VMware admin your requirements (CPU, memory, disk)
  • 2. VMware admin selects a host, selects New Virtual Machine and completes a wizard to create the VM

This whole process from start to finish takes minutes and costs nothing in real dollars, and therein lies the problem. People have a perception that virtual machines are free because they have no physical presence and therefore requests for new VMs typically meet little resistance. The reality is that virtual machines are not free; their cost is measured in the resources that they consume on the host and storage devices. Hardware that is used for virtual servers is typically much more expensive compared to physical servers as it must have high scalability to support large numbers of virtual machines. Whereas the hardware costs for a single-use physical server may range from $5,000 – $10,000, the hardware costs for a virtual server can range from $20,000 - $50,000. When you also factor in the expensive shared storage devices that hosts typically connect to, as well as license costs for the hypervisor and management software, the price tag climbs even higher.

So why is VM sprawl bad? Because it makes the management of your virtual environment more difficult and can end up costing you real money. Every VM in a virtual environment consumes resources and therefore does have a cost associated with it which is basically a percentage of the money that you spend on your virtual environment. Your hosts do not have unlimited resources, and once they are exhausted, you’re going to have to buy more hardware to add more VMs. So while VMs may appear to be free, they are not and you need to utilize the same justifications and processes that you would normally require when new physical servers are requested.

How do you know if VM sprawl is occurring in your environment? It’s not easy to recognize as VM sprawl has no clearly defined symptoms, but there are some characteristics of it that can help you discover it. VM sprawl doesn’t happen overnight. It gradually occurs over time, and since the growth of your virtual environment is subtle, it is hard to notice. However if you look at the size of your virtual environment at two distant points in time, say 6 months ago compared to today, it becomes much more apparent. Using applications like Veeam ONE, which provides visibility of your virtual infrastructure and information about utilization of your resources, helps you analyze trends in your environment and identify areas of long term growth where VM sprawl may be found. You also need to identify VMs that are no longer used, are powered off and that have over-allocated resources. Veeam ONE can help you with this as well so you can reclaim those wasted resources and redirect them to the VMs that actually need them.

There are several ways that you can help prevent VM sprawl. First, you need to implement an approval process that requires justification and tracking of requests to create new VMs. You can also implement chargeback by monitoring metrics on host resource usage so you can help the business understand the real cost of virtual machines. You should also make sure you adequately document any new virtual machines and monitor their lifecycles so VMs can be deleted when they are no longer needed. Finally, by actively monitoring VM resource usage, you will be able to identify VMs that may not be active anymore as well as VMs that may no longer need all the resources that were originally assigned to them. Indications of abandoned VMs can be both powered-off VMs, as well as powered-on VMs that haven’t had substantive activity in a long time. Veeam ONE provides tools that can help you identify and stop VM sprawl so your virtual environment doesn’t become a virtual junkyard. This can be done via reports and dashboards that track resource usage and idle virtual machines, as well as mechanisms for categorizing virtual machines and documenting the virtual infrastructure.

Storage Management

Storage is the most critical and valuable resource in a virtual environment as it serves as the persistent foundation for the virtual machines running on a host. Because storage can make or break a virtual environment, having a properly architected and well-performing storage system is paramount. Storage is typically also the most costly part of your virtual environment as well. Because of all these factors, you need to ensure that your storage operates at peak efficiency without bottlenecks and that you do not needlessly waste space. Managing your storage resources is a constant challenge - not only do you need to ensure that they perform well, you also need to ensure you have sufficient capacity for your virtual machines.

One of the biggest challenges with virtual disks lies in managing VM snapshots and thin disks. Thin disks allow for over-provisioning of storage resources as virtual disks start with a minimum of space and grow as needed as disk blocks are written to them by the guest OS. Virtual machine snapshots can be a nuisance because they are often forgotten and slowly consume space on your datastores. Thus, snapshots and thin disks present a double threat to your storage resources as they consume growing amounts of disk space and put your datastores at risk for running out of space. Having your datastores run out of space is a situation you want to avoid at all costs as it can result in all your VMs being suspended and data corruption within the VMs.

Of all the resources that your host supplies to the virtual machines running on it, storage is the slowest resource because it relies on mechanical hard drives. Storage resources are typically not directly attached to hosts in virtual environments and instead, shared storage is commonly used to take advantage of the advanced features that virtualization provides. The path from your VM to the underlying storage device is complicated with many individual components and queues that I/O must go through. There are also many factors that can impact storage performance and cause I/O bottlenecks such as:

  • Disk alignment
  • Multi-pathing
  • Improper configuration and settings
  • Excessive I/O
  • Improper architecture/design
  • Too many snapshots

When an I/O bottleneck occurs, it can choke the life out of your VMs and slow them to a crawl. The biggest challenge with I/O bottlenecks is detecting them and figuring out their cause so you can resolve them. The key to this is having a good tool that can monitor your storage resources and alert you when performance is degraded and show where that degradation is occurring in your storage subsystem. Trying to rely on vCenter Server for this can be difficult because it has only basic reporting capabilities. Veeam ONE makes storage management much easier by providing full storage monitoring capabilities including disk space, I/O latency, disk issues and datastore monitoring. Veeam ONE also helps you keep an eye on over-provisioned datastores, provides utilization trend analysis and assists in optimizing VM placement on datastores.

Business Views

Virtual environments produce unique challenges that you typically do not encounter in physical server environments. When you virtualize, you have many servers and applications all running on the same physical hardware which causes your hosts to become melting pots of virtual machines. The number of virtual machines that can run on a host (also known as density) is increasing at a rapid pace due to the latest advances in server and storage hardware. Even the smallest of hosts these days can easily hold dozens of virtual machines. This can present a problem when VMs with different functions and from different departments and business groups get lumped together on the same host, making chargeback and reporting a challenge.

You can attempt to segregate VMs on specific hosts but this is often neither efficient nor cost effective. A virtual environment is dynamic with VMs continually moving between hosts for maximum efficiency, and trying to organize VMs and keep them in place is not a practical option. What you need is an organization layer that is applied on top of your virtual environment so you can organize and visualize your virtual environment based on business needs and priorities. This allows you to more effectively manage your VMs and makes monitoring, reporting and chargeback clearer and more intuitive because the views you create are based on a business viewpoint instead of a virtualization or physical viewpoint.

Instead of using the categorization objects built into virtualization, such as datacenters, clusters, resource pools and hosts, you can organize your VMs in more useful ways that align with your business structure or other characteristics, such as:

  • Service level agreements (SLAs)
  • Departments (sales, marketing, R&D, IT)
  • Company or business unit
  • Geographic location
  • Server role (database, email, web, authentication)
  • Operating system (Windows versions, Linux Distros)

You can do this using the basic VM folders that are built into vCenter Server, but setting them up and maintaining them can be very time-consuming. Additionally, VM folders are only one dimensional. You can also use custom attributes but they are a pain to maintain as well. Veeam ONE, on the other hand, provides an automated, flexible and dynamic way to group the many objects in your virtual environment within different categories. While you can do this manually using the standard VM folders, Veeam ONE automates the categorization process for you based on pre-defined rules. It also allows for the editing of attributes either individually or in mass and is a huge time saver compared to using standard VM folders. Organization in a virtual environment is very important, and Veeam ONE allows you to easily categorize the objects in your virtual environment with minimal effort.

Change Management

Managing the changes that occur within the data center is a necessary and critical task that must be done to ensure that you have a record of all the changes to your environment. While this is an important function for any data center, it is especially important for virtual environments because of their architecture. Virtualization is all about the sharing of common infrastructure components and resources between many virtual machines. Because of this type of architecture, seemingly innocent and minor changes that are made can have really big impacts in a virtual environment. In addition, changes that are made can have ripple effects across all the hosts and have large scale consequences. An example of this is a change made to a virtual Distributed Switch (vDS) where the configuration is shared across many hosts. If a change is made to the network configuration that causes a problem, it would apply to all your hosts that use the vDS and it could cause a large number of VMs to all lose network connectivity. This applies to shared storage as well - hosts in a virtual environment all share common storage arrays and if a change is made to a storage array that causes a problem, it could impact all the hosts that have VMs running on that storage array. The bottom line is you have to be extremely careful when making any changes in a virtual environment.

There are many reasons companies implement change management procedures within their data centers, and one particularly beneficial practice is to ensure every change is documented so that a history of all changes is maintained. This is especially useful when problems occur in your environment and you need to troubleshoot their cause. When problems occur, the inevitable first question is “It’s been working fine all this time, what changed?”. Change tracking can play a big role in troubleshooting when you can go back and see exactly what changed.

Virtual environments are complicated with many moving parts and dependencies, and hunting for the cause of a problem can be both complicated and timeconsuming. To further complicate change management in a virtual environment, there are typically many people with administrative access and figuring out who exactly made a change can be challenging. Many problems in virtual environments have big impacts and time isn’t a luxury you can afford when trying to resolve problems.

Veeam ONE tracks all the changes that are made in your virtual environment so you have all the information you need quickly at your fingertips without having to hunt it down. Veeam ONE automatically captures all changes that occur across all your hosts and tells you the “who, what, where, when and how” of every change. Further, by providing instant visibility of changes, it facilitates how you can monitor and improve change workflows and investigate problematic changes. This allows you to easily isolate the root cause of performance slowdowns and availability issues so you can quickly get your virtual environment healthy again.

Monitoring & Reporting

Virtual environments are like small children, they require constant supervision and monitoring. If you ignore them and don’t keep an eye on them you could end up with a real big mess on your hands.

Monitoring performance in a virtual environment is much more complicated than with traditional physical servers because physical resources are shared by many virtual machines. In a virtualized environment there is more to monitor and interpreting the statistics and results can be difficult.

In traditional environments, performance is monitored inside the guest operating system. This isn’t effective in a virtual environment as the guest OS no longer sees the underlying physical hardware of its host. Instead, the VM sees virtual hardware which is emulated by the hypervisor. This means the guest OS doesn’t know the big picture and is only aware of its own virtual hardware. Because of this, monitoring needs to be performed at the virtualization layer instead of the guest OS layer because direct access to the physical hardware results in accurate monitoring by the virtualization layer. There are also many performance statistics that are unique to virtual environments that have to do with things like how resource access is scheduled by the hypervisor. So not only do you have to monitor at the virtualization layer, you also have to know what to look for and how to interpret the statistics.

vCenter Server does provide performance statistics for your virtual environment but it has some limitations and does not help you understand what the statistics mean. Also, while monitoring performance at the virtualization layer is important, you still need to monitor performance at the guest OS layer as well because there are many relevant performance counters that matter regardless of the virtualized environment. vCenter Server only reports on the virtualization layer and does not extend too far into the guest OS layer, so vCenter Server by itself does not provide a complete monitoring solution. Veeam ONE not only monitors at the virtualization layer but it also provides visibility into the guest OS and provides capabilities for managing processes there.

There is an overwhelming number of performance statistics related to a virtual environment and trying to analyze and understand them all can be very complicated and time-consuming. Many of the performance statistics are not all that useful except in certain situations, and you should focus on the handful of statistics that are key indicators to the performance of all of your host resources. Some of the key statistics in each resource area are listed below:

  • CPU Ready — A VM statistic which is the amount of time in milliseconds that is spent waiting for a CPU to become available. High CPU Ready times can indicate a CPU bottleneck or too many vSMP VMs on a host.
  • Mem Swapped — A VM/Host statistic which is the amount of memory that is being swapped to/from a VMs virtual disk swap file by the VMkernel, measured in Kilobytes. A large number here represents a problem with lack of physical host memory and is a clear indication that performance is suffering as a result.
  • Disk GAVG — A Host statistic which is the average amount of time in milliseconds (latency) that it takes to process a SCSI command issued by the guest OS. GAVG is the sum of latency in the VMkernel (KAVG) and latency in the storage device (DAVG). High disk latency can really slow down VMs. In general, GAVG should be below 20ms.
  • Disk Commands — A VM/Host statistic which is the number of SCSI commands that have been issued. Disk commands show the number of I/O operations per second (IOPS) that are occurring. For VMs, this is the total commands to the disk target that the VM is located on. For hosts, this is the total commands to all the disk targets. This is a good general disk statistic that shows how much disk activity is happening in the environment.
  • Net Usage — A VM/Host statistic which is the combined transmit and receive rates measured in KBps. For VMs, this is the sum of all network traffic across all the vNICs assigned to a VM. For hosts, this is the sum of all network traffic across all the pNICs in a host. This is a good general indicator of how much network traffic is occurring and lets you see how saturated your hosts’ NICs are and if you are nearing their maximum throughput capacities.

You can’t afford to be reactive when it comes to performance in a virtual environment. — you must be proactive to recognize and prevent big problems from occurring. Monitoring shouldn’t be a periodic task. If you don’t do it continuously, how will you know if you have a new problem today or if it’s been there all along. Your virtual environment might be trying to tell you something, and if you’re not listening, you’re not going to hear it. Veeam ONE provides maximum visibility of your virtual environment so you are aware of everything that is happening there.

Capacity Planning

With traditional physical server environments, capacity planning is fairly easy and straightforward as each server is an individual entity much like a single house. If you need more capacity you just upgrade your existing servers or buy more servers. Trying to plan for capacity requirements in a virtual environment is much more difficult and complicated as you have a lot of shared infrastructure components that all work together as a whole unit, much like a city. Therefore, a balance of resources is critical in a virtual environment.

If resources are not balanced on a host, it can lead to wasted resources that cannot be used. For example, if a host runs out of physical memory, it limits the number of VMs that can run on that host despite having plenty of other resources available to it. Trying to keep your resources balanced isn’t all that simple; you need to look at historical resource trends and usage to determine where that balancing point is.

Capacity planning is not just about predicting future resource needs, it’s also about optimizing the configuration and utilization of your environment and making sure you are not needlessly wasting any of your existing resources. To get the most out of your virtual infrastructure, you need to run it as efficiently as possible. Having wasted resources reduces your ROI and decreases efficiency. If you have too much stuff in your house, rather than going out and buying a bigger house, why don’t you first get rid of what you don’t need any more and then see if you still need a bigger house. Eliminating sprawl and optimizing configuration and utilization of the resources in your virtual environment is critical. Too often we apply our mentalities and practices from the physical world to the virtual world and this can lead to inefficiency.

Trying to calculate your capacity needs is further complicated by the need to have sufficient spare capacity available to be used to support features like VMware High Availability (HA) admission control. In order for HA to work properly, you have to leave enough spare capacity to support the loss of one or more hosts so that VMs can be started on other hosts when needed. Trying to factor spare capacity into your resource calculations can quickly get complicated, and having a tool that can do this for you can make it a much easier exercise.

Veeam ONE can make both planning future capacity needs and identifying resource optimizations a much simpler task. It tracks your virtual environment’s configuration as well as trends based on past and current resource utilization. Furthermore, it is your crystal ball to help predict and plan for future resource needs. It enables you to perform “what-if” analyses to evaluate the impact of adding or removing hosts and VMs in your virtual environment. You can also leverage capabilities that provide you with provisioning recommendations that furnish you with the justification and confidence you need to go to management to request funding to expand your virtual environment.

Enterprise Monitoring

Every data center typically has many separate management silos for all the various technologies, products and applications that reside within it. Trying to manage all these silos can be particularly challenging, and using multiple tools to manage multiple systems can result in multiple headaches. Every layer in the computing stack (hardware, OS, virtualization, applications) is often managed with a separate application, and even within each layer, there are often multiple management tools. Having fewer panes of glass to look through makes management much easier.

Most data centers that use VMware as their virtualization platform also use Microsoft for the operating system and applications running within their VMs. What typically ends up happening is that VMware administrators are also Microsoft administrators and vice versa. The tools used to manage VMware environments such as vCenter Server are typically designed to only manage the virtualization layer and not the other layers in the computing stack. On the Microsoft side, Systems Center Operations Manager (SCOM) and Systems Center Virtual Machine Manager (SCVMM) are often used to manage the Microsoft products in the data center, including Hyper-V, which often runs side by side with VMware. If you are already using SCOM and SCVMM, why not use these to manage VMware as well? Doing so gives you a single management pane of glass that brings the silos together and allows for centralized management and monitoring of multiple layers within your environment.

The Veeam nworks Management Pack for VMware fully integrates with Microsoft System Center and provides a unified view of your whole environment, allowing you to manage the entire computing stack from one console. The nworks MP extends System Center support to VMware environments by collecting data agentlessly using the APIs that are built into vSphere. It enables System Center functionality for all VMware components including vCenter Servers, clusters, hosts, virtual machines, storage and hardware.

The nworks MP provides end-to-end visibility from the physical server hardware, to the hypervisor, to the virtual machines and the operating system, applications and services running on them.

Summary

We covered seven of the top management challenges in VMware environments in this paper and how Veeam Software can help you overcome them. Virtual environments have unique challenges and issues that must be dealt with using the proper tools. The tools that are designed to manage traditional physical environments typically are not effective in virtual environments because they cannot see the virtualization layer. Veeam Software provides the tools that you need to overcome the challenges you face with virtualization and helps ensure that your virtual environment stays healthy and problem-free. If you are going to make an investment in virtualization technology, you should invest in the proper tools to manage it as well.

Top 7 VMware Management Challenges
Eric Siebert, VMware vExpert

About the Author

Eric Siebert is an IT industry veteran, author and blogger with more than 25 years of experience, most recently specializing in server administration and virtualization. He is a very active member of the VMware VMTN support forums, where he's attained the elite Guru status by helping others with their virtualization-related challenges. Siebert has published books including his most recent, "Maximum vSphere" from Pearson Publishing, and has authored training videos in the Train Signal series. He also maintains his own VMware information website, vSphere-land, and is a regular blogger and feature article contributor on TechTarget’s SearchServerVirtualization and SearchVMware websites.