Lori MacVittie, senior technical marketing manager at F5 Networks (http://www.f5.com/), says:

Today, the argument regarding responsibility for auto-scaling in cloud computing as well as highly virtualized environments remains mostly constrained to e-mail conversations and gatherings at espresso machines. It’s an argument that needs more industry and “technology consumer” awareness, because it’s ultimately one of the underpinnings of a dynamic data center architecture; it’s the piece of the puzzle that makes or breaks one of the highest value propositions of cloud computing and virtualization: scalability.

The question appears to be a simple one: what component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity? Neither the answer, nor the question, it turns out are as simple as appears at first glance. There are a variety of factors that need to be considered, and each of the arguments for – and against – a specific component have considerable weight.
Today we’re going to specifically examine the case for the network as the primary driver of scalability in cloud computing environments.

ANSWER: THE NETWORK
We are using the “network” as a euphemism for the load balancing service, whether delivered via hardware, software, or some other combination of form-factors. It is the load balancing service that enables scalability in traditional and cloud computing environments, and is critical to the process. Without a load balancing service, scalability is nearly impossible to achieve at worst, and a difficult and expensive proposition at best.

The load balancing service, which essentially provides application virtualization by presenting many instances of an application as a single entity, certainly has the visibility required to holistically manage capacity and performance requirements across the application. When such a service is also context-aware, that is to say it can determine the value of variables across client, network, and server environments it can dynamically apply policies such that performance and availability requirements are met. The network, in this case, has the information necessary to make provisioning and conversely decommissioning decisions to ensure the proper balance between application availability and resource utilization.

Because “the network” has the information, it would seem to logically follow that it should just act on that data and initiate a scaling event (either up or down) when necessary. There are two (perhaps three) problems with this conclusion. First, most load balancing services do not have the means by which it can instruct other systems to act. Most such systems are capable of responding to queries for the necessary data, but are not natively imbued with the ability to perform tasks triggered by that data (other than those directly related to ensuring the load balancing service acts as proscribed). While some such systems are evolving based on need driven by the requirements of a dynamic data center, this triggers the second problem with the conclusion: should it? After all, just because it can, doesn’t it mean it should. While a full-featured application delivery controller – through which load balancing services are often delivered – certainly has the most strategic view of an application’s (as in the whole and its composite instances) health and capacity, this does not necessarily mean it is best suited to initiating scaling events. The load balancing service may not – and in all likely hood does not – have visibility into the availability of resources in general. It does not monitor empty pools of compute, for example, from which it can pull to increase capacity of an application. That task is generally assigned to a management system of some kind with responsibility for managing whatever pool of resources is used across the data center to fulfill capacity across multiple applications (and in the case of service providers, customers).

The third problem with the conclusion returns us to the same technical issues that prevent the application from being “in charge” of scalability: integration. Eventually the network will encounter the same issues as the application with respect to initiating a scaling event – it will be tightly coupled to a management framework that may or may not be portable. Even if this is accomplished through an highly deployed system like VMware’s frameworks, there will still be environments in which VMware is not the hypervisor and/or management framework of choice. While the “network” could ostensibly code in support for multiple management frameworks, this runs the risk of the vendor investing a lot of time and effort implementing what are essentially plug-ins for frameworks which bog down the solution as well as introduce issues regarding upgrades, deprecations in the API, and other integration-based pitfalls. It’s not a likely scenario, to be sure.

To sum up, because it’s strategic location in the “network”, load balancing services today have the visibility – and thus information – required to manage scaling events, but like the “application” it has no control (even though some may be technically capable) over provisioning processes. And even assuming control over provisioning and thus ability to initiate an event, there remains integration challenges that may in the long run likely impact operational stability.