Monitoring VM Performance is a Virtual Environment’s Most Important Activity

This blog is sponsored by VKernel (www.vkernel.com).

– Greg Shields, partner and principal technologist with Concentrated Technology (www.concentratedtech.com), says:

A virtual environment is by nature an invisible environment. You simply can’t crack the case on a vSphere host and expect to “see” the behaviors going on inside. That’s why its counters are so important. They represent your only way to understand the behaviors and quantify potential resolutions.

But counters by themselves are very scary things. A counter is by definition just a number. Put together enough of those numbers, and you’ll create a graph not unlike Figure 1 in the previous article. Divining meaning from the points on that graph, however, is another thing entirely. Studying charts and graphs is an activity that can consume every part of your workday. With those graphs constantly evolving with a virtual environment’s behaviors, just keeping up is a challenge all its own.

Yet monitoring virtual machine performance is a virtual environment’s most important activity. That’s why this series’ previous article suggested that an unaided person can never effectively convert raw data into actionable intelligence. Oh, yes, in a tiny environment with just a few interdependencies, you probably might. But most of our VMware vSphere data centers are large and distributed. Finding the source of a performance issue isn’t easy when you’re starting at its unending integers.

Relating those numbers to the actions you should take is what you really want. Figure 1 shows that line of thinking in relation to the first article’s notion of actionable intelligence. In it, you can see how the raw monitoring data from a vSphere system can flow through some kind of built-by-somebody-else mathematical model that converts raw data into suggested actions.

What does that raw data look like? What top-ten counters might you plug-in to such a model to represent virtual machine and virtual host behavior? These ten are the topic of this second article. In the next section, you’ll gain an appreciation for the amount of effort it takes to convert just ten integers into answers.

Ten Behavior-Quantifying Counters
What follows are ten counters considered to be most important in determining the rough behaviors within a vSphere environment. Think of them as an equation with ten variables. With the right equation, plugging in these values will net you an approximation of your vSphere environment’s behaviors.

Counter #1: CPU.ready.summation
CPU ready time relates to the percentage of time a virtual machine was ready to use a physical CPU but could not get scheduled to run on it. A high CPU.ready.summation time means that virtual machines are waiting for physical CPU resources that aren’t available. A high count here tends to indicate that physical CPU is a bottleneck to performance.

Counter #2: CPU.usagemhz.average
The CPU.usagemhz.average counter measures average CPU usage in megahertz during a configured interval. Measured over all physical processors, this counter is a primary indicator of the amount of CPU load being placed on the host.

Counter #3: mem.active.average
mem.active.average represents an estimation of how much memory is actively being used by virtual machines. This estimation is made by the VMkernel and is based on recently-touched memory pages. This memory counter references the quantity of guest memory the guest is actually using to accomplish its workload requirements.

Counter #4: mem.consumed.average
Slightly different than the mem.active.average counter, mem.consumed.average measures the amount of guest physical memory consumed by a virtual machine. When measured for a virtual machine, this level of memory includes shared memory and memory that is reserved but not used. This counter can also be measured for hosts and clusters. When measured for a host, the counter measures the amount of machine memory used on the host. For a cluster, it measures the amount of memory used by all powered on virtual machine in the cluster.

Counter #5: mem.swapped.average
Host swapping is a last-ditch approach used by the VMkernel during periods of contention to ensure virtual machines never run out of available memory. Host swapping requires transferring memory from RAM to disk, which significantly reduces its performance. The mem.swapped.average counter references the current amount of guest physical memory that has been swapped out to its swap file. A value here greater than zero can indicate that memory is a bottleneck to performance.

Counter #6: mem.vmmemctl.average
Over-commitment in vSphere environments is handled through a process called ballooning. The process reclaims unused memory from running virtual machines to make it available for others. An excessive value for mem.vmmemctl.average means that virtual machines have been assigned too much memory that they are not using but is needed by other virtual machines. A large amount of ballooning can have an impact on overall host performance.

Counter #7: disk.busResets.summation
SCSI bus resets occur when a read or write command cannot be completed within an acceptable amount of time. These resets often indicate an underlying performance issue within storage hardware and are measured using the disk.busResets.summation counter. A non-zero value here can indicate that storage is a bottleneck.

Counter #8: disk.totalLatency.average
Total latency refers to the absolute quantity of time elapsed between submitting a command to storage, processing that command, and receiving the anticipated response. Measured by the disk.totalLatency.average counter, its information helps to identify the total amount of virtual machine processing delay caused by storage hardware, and is another indication that storage is a bottleneck.

Counter #9: disk.usage.average
Related to the disk.totalLatency.average counter, disk.usage.average measures the total disk I/O rate. This information useful for identifying when storage—either within the storage itself or the connection to that storage—is a bottleneck to virtual machine performance.

Counter #10: net.usage.average
This final counter measures the combined send and receive rates for network traffic during a configured interval. The net.usage.average counter can be measured against a single virtual machine or the entire host, and is used to identify how much data traffic is passing in and out of the measured host or virtual machine.

Counters Aren’t Everything
These counters might create that ten-variable equation, but they by no means fully approximate every behavior your vSphere environment experiences. The actual set of counters is far higher, making the model far more complex. That said, knowing these critical ten gets you started down the road of quantifying vSphere’s behaviors. It also gives you a much better appreciation for how necessary assistive support is in resolving VMware vSphere’s biggest performance issues.

That said, counters aren’t everything—nor are mathematical models. Both counters and the models they feed only work when the environment remains predictably stable. Although virtualization is by nature a dynamic architecture, you can gain the right kinds of stability by observing a set of good practices. The third and final article in this series explains twelve practices that will ensure your actionable intelligence is in fact intelligent.

Monitoring VM Performance is a Virtual Environment’s Most Important Activity

Recent Posts

Archives