This blog is sponsored by VKernel (www.vkernel.com).
– Greg Shields, partner and principal technologist with Concentrated Technology (www.concentratedtech.com), says:
Keeping eyes on ten counters for a single virtual machine isn’t easy. Doing the same for dozens or hundreds of virtual machines is functionally impossible for any human being. That’s why assistive tools are necessary to convert those counters’ raw data into actionable intelligence. Answering that all-important question of What should I do? requires aligning what’s going with the range of possible resolutions.
This last article was written specifically to highlight how difficult that process is with counters alone. If net.usage.average is high today but so is disk.busResets.summation, what should you do? Is the bottleneck related to network oversubscription or to a situation in your storage layer? Even worse, are both subsystems experiencing a problem, or is one problem causing the other?
Even more insidious is the situation where the issue isn’t a problem at all. Instead of sourcing from some hardware shortcoming, perhaps the problem relates to another administrator’s storage or networking activities. Maybe they’ve just begun a large and unthrottled migration of data over the network. Numbers lie. They do so particularly when no governance exists over the activities those numbers are measuring.
Thus, this final article intends to bring stability to your vSphere environment. Indeed that environment is highly dynamic. That’s the nature of virtualization and its technologies that aggregate IT workloads. But taming its dynamics requires a set of stabilizing practices that ensure counter data retains meaning (see Figure 1). Governing your vSphere environment’s activities goes a long way in ensuring its behaviors can be predictably categorized.
The eleven processes described in this article assist with this task. Each deals more with the “people” side of virtualization’s technology, but all are necessary to instill that predictable stability the mathematical models require.
Along with the first eleven is a twelfth and final process that involves the model itself. You could absolutely create your own model, one that takes into account each behavior’s range of possibilities and its impacts on counters—but why do that? Leveraging one built by experts means immediately incorporating their experience into your vSphere environment. In a way, it’s a lot like having the world’s greatest performance and capacity management experts right at your fingertips. Consider these twelve processes as your final piece in resolving VMware vSphere’s biggest performance issues.
The Daily Practices
VMware’s activities represent an always-on service. Unlike a file server that can go down once in a while and not harm the bottom line too much, VMware’s services are the foundation upon which all other data center activities reside. When VMware goes down, everything goes down.
That’s why vSphere’s daily practices deal most with monitoring. Discovering inappropriate behaviors early and before they impact users is of greatest importance in these daily practices. Doing so via a dashboard that incorporates hardware and software behaviors beneath a single pane of glass should be a desired goal. Get there by incorporating the first three practices:
Practice #1: SNMP Monitoring
VMware by default doesn’t do a terrifically good job with its SNMP exposure. Enabling and tuning such monitoring requires extra steps that aren’t immediately obvious within its interface. But SNMP monitoring is critically important when that single-pane-of-glass management is your desired end state. For the first practice, brush up on your SNMP technologies, or find a solution that’ll automate their implementation (and, more importantly, their tuning once those technologies are in place!).
Practice #2: Resource Utilization Monitoring
Resources are constantly in flux inside vSphere. Virtual machines use more CPU for a while, then use less. Their use depends on the needs of processes and users. Monitoring that resource utilization across virtual machines, hosts, and clusters, is fundamentally important—even in the fully HA/DRS-automated environment—to get a handle on capacity issues before they impact users.
Practice #3: Alert Monitoring
vSphere by default will alert you when preconfigured conditions occur. But when was the last time you looked through its alerts? Do you know which are enabled and which are properly tuned? Have you also integrated them into your greater alert management system? If not, you’ll be missing them when they’re announced in the vCenter Client but nowhere else, or you’ll never get them at all because they were never enabled. The practice of checking vCenter alerts on a daily basis is your first line of defense against a vSphere environment that isn’t meeting the needs of its virtual machines.
The Monthly Practices
In addition to the daily practices are a set that require less-frequent attention. This reduction in frequency does not insinuate that these tasks are less meaningful, only that they require less frequent attention. Arguably, the monthly tasks comprise the more important group because they are more likely to be forgotten over time. Set up a scheduled activity on your calendar, or incorporate a solution to help you remember these five in-depth practices.
Practice #4: Disk Space Utilization
Another area where VMware vSphere has never done a terrifically good job is alerting when available disk space is low. Yet at the same time, VMware warns that a datastore that fills completely is one of the worst situations any environment can experience. You never want the situation where disk space runs out, particularly in thin provisioned environments where virtual machines think they have more disk space than they really do. That’s why Practice #4 reminds you to verify your disk space on at least a monthly basis—if not every day during times when space is low.
Practice #5: Application Restarts
The virtual administrator spends so much time worrying about resources and hardware that they sometimes forget vSphere is really about the applications. Those applications sometimes experience bad behaviors like restarts during inappropriate times. Others need restarting from time to time to return them to health. Taking a monthly look at application histories and behaviors helps keep your applications in-line.
Practice #6: Server Reboots
Virtual servers too can have odd reboot requirements and behaviors. Some reboot spontaneously, giving little warning that they’re about to incur a service outage. Others need regular reboots to clear memory and collected processes. Maintaining a server reboot log on a monthly basis and monitoring for resource oddities keeps their health at maximum.
Practice #7: Offline Maintenance
One class of reboots no one likes but everyone does are those surrounding updates. The monthly update cycle has become de rigueur in most data centers, with patches themselves often released on a monthly basis. Use that downtime as your opportunity to right-size assigned resource levels to the values you’ve determined over the past month. Also use that time to handle any special backups, snapshots, and other maintenance activities that work best while the virtual machine is offline.
Practice #8: Overall Health
Last, an overall health check is in order on a monthly basis. This health check isn’t necessarily just to the virtual machines running atop vSphere but also to the vSphere environment itself. Review logs, validate load balancing effectiveness (including moving virtual machines between clusters), verify data center health, and perform all the care-and-feeding activities on a monthly basis you keep promising to do. Leaning on assistive technology here that reminds you of such health checks can be useful, particularly when that technology gives you specific advice for what steps to take.
The Yearly Practices
Your VMware vSphere yearly practices don’t come around that often, but they’re no less important. They center on evaluating future initiatives, incorporating feedback from monitoring solutions and users alike, and reflecting on and optimizing the processes you’ve laid into place. Your yearly checks are important opportunities for you to improve upon the governance activities you incorporated in the past year.
Practice #9: Budgeting for Replacement or Augmentation
The first task of capacity management is resource assurance, ensuring that virtual machines have the physical resources they need to do their jobs. The second task is planning, analyzing resource utilization over the long term to look for trends. A well-managed environment should be able to draw a straight line that begins with their historical usage and ends at the date they’ll need more resources. With the right daily and monthly practices in place, Practice #9 becomes easy when your annual budget numbers are due.
Practice #10: User Feedback
That planning activity also requires an understanding of the users’ experience. Your numbers may, for example, show that servers and hosted desktops are performing to expectations; however, your expectations and those of your users might be mismatched. Interviewing them and incorporating their feedback ensures that the services you’re delivering are meeting their needs.
Practice #11: Large-Scale Change Initiatives
The eleventh task goes one step further. Just like your virtual environment, your business is a highly-dynamic entity. Planning for the future in a void of business initiatives means you won’t have the capacity in place when the business requires it. Planning for and implementing that capacity beforehand means avoiding performance conflicts—ones that will have you updating your resume as you search for new employment—down the road.
Practice #12: The Ongoing Practice
Last is the entire reason for this Essentials Series. You as a human can’t monitor all those counters alone. You need the assistive support of software solutions that translate raw data into actionable intelligence. The final practice suggests that you look to solutions that accomplish that task. Only with a little help can you tame VMware’s complicated beast, full of moving parts, deep integrations, and unexpected impacts.