Lori MacVittie, Senior Technical Marketing Manager at F5 Networks (http://www.f5.com/), says:

Examining responsibility for auto-scalability in cloud computing environments.

Today, the argument regarding responsibility for auto-scaling in cloud computing as well as highly virtualized environments remains mostly constrained to e-mail conversations and gatherings at espresso machines. It’s an argument that needs more industry and “technology consumer” awareness, because it’s ultimately one of the underpinnings of a dynamic data center architecture; it’s the piece of the puzzle that makes or breaks one of the highest value propositions of cloud computing and virtualization: scalability.

The question appears to be a simple one: what component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity? Neither the answer, nor the question, it turns out are as simple as appears at first glance. There are a variety of factors that need to be considered, and each of the arguments for – and against – a specific component have considerable weight.

We’ve examined each of the three possibilities, the three “players” in the scalability game in dynamic environments: the network, the application, and the management framework. All have a piece of the puzzle, but none have both the visibility and the ability to initiate a provisioning event – at least not in a way that maintains cost and operational efficiency.

From our previous discussions it seems obvious that the application does not – and indeed cannot – be enabled with the control required to manage scalability. But the network (load balancing service) and the management framework both could, ostensibly, be enabled with the ability to control the provisioning process and imbued with the visibility into the data necessary to initiate scaling events. But just as true is that doing so, in either case, would incur serious repercussions to operational stability and potentially increase costs when the integration requirements are taken into consideration.
Thus, it seems the most efficient and cost-effective means of managing scalability in cloud computing environments is via a collaborative operational process involving all three components: application, network, and management framework.

The application remains responsible for providing the per-instance capacity data required. This may be as simple as a connection or throughput high-water mark, or as complex as near-time load data. In a truly dynamic, automated data center this information would be provided by the application through some standardized mechanism, such as specific API-accessible service. Such standardization would enable portability across environments, eliminate the possibility of error on the part of the operator when communicating those limits to the load balancing service, as well as provide the means by which the application could leverage knowledge of its application infrastructure constraints to determine dynamically what those limits may be. That’s important even when excluding the possibility of inter-environment portability because it is possible that intra-environment movement across time may change the capabilities of the underlying server infrastructure such that it impacts the capacity of the application itself.

The network, or load balancing service, remains the only point in the architecture at which application capacity is easily obtained and at which application instance capacity is monitored. This data is critical to determining at what point additional (or less) capacity may be necessary. While the load balancing service may be assigned the responsibility of notifying the management framework when more or less capacity is required, it should not be responsible for initiating the provisioning process. The integration required to do so would effectively negate many of the efficiency benefits gained by the overall scaling architecture, and is fraught with potential obstacles in the face of still-evolving management frameworks. The network is adept at managing, from a monitoring perspective, the historical and current capacity of an overall application (defined as the interface with which clients interact and the aggregation point at which multiple application instances combine to act as a single entity) but thresholds and limitations – particularly those related to costs – are not necessarily part of the overall configuration of such services, nor should they be. Such operational and business requirements are best left codified and managed by a management framework as they are unique to not only the customer but the environment in which applications are deployed. This also leaves open the possibility for cross-environment scalability, enabled by management broker-enabled frameworks and components. While delivery of cross cloud-deployed applications is certainly under the purview of the load balancing service, the provisioning of resources across those environments is not feasible.

The management framework, integrated with billing and metering and provisioning systems, is the appropriate place for initiation of provisioning events to occur. Without bogging down the infrastructure architecture – and unnecessarily complicating it, as well – it is impossible for management frameworks to efficiently gather the requisite data and make the determination whether or not to initiate a scaling event. Leaving the initiation decision to an “external” management framework has the added benefit of allowing future innovation to occur. For example, in the future it might be the case that the management framework is leveraged to offer additional services based on capacity as an alternative to more capacity. When application performance is the trigger for a scaling event, customers might one day have the option of enabling other infrastructure services – optimizations and accelerations – that can ameliorate the need for additional capacity. From a provider standpoint, increasing the revenue per instance through value-added services makes more sense – and provides a higher ROI – than simply adding additional instances. But without a management framework capable of factoring in prioritized services to the decision making process, this ability becomes a more difficult proposition.

You may recall that the definition of “Infrastructure 2.0” was so broad as to seem unrealizable. But when combined with the unspoken requirement for collaboration across components, such a definition is certainly not only realizable, but desirable. Infrastructure 2.0 was never about a single component being able to provide everything, it was always about enabling collaboration of the infrastructure in a manner that has been successfully carried out in the software arena to provide a highly-connected, intelligent “network” of applications.

By leveraging collaboration in the infrastructure we can achieve the goal of a dynamic data center. Whether the result is simply highly virtualized or a fully cloud-computing enabled architecture is not nearly as relevant as making the data center more operationally efficient through integration and collaboration.
The answer to the question forming the basis for this series of posts, “What component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity”, is “no single component can be responsible for both and still maintain the efficiency and performance of the environment and application.” The answer to the question is it requires an architecture; a collaborative and dynamic architecture.