Today, the argument regarding responsibility for auto-scaling in cloud computing as well as highly virtualized environments remains mostly constrained to e-mail conversations and gatherings at espresso machines. It’s an argument that needs more industry and “technology consumer” awareness, because it’s ultimately one of the underpinnings of a dynamic data center architecture; it’s the piece of the puzzle that makes or breaks one of the highest value propositions of cloud computing and virtualization: scalability.
The question appears to be a simple one: what component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity? Neither the answer, nor the question, it turns out are as simple as appears at first glance. There are a variety of factors that need to be considered, and each of the arguments for – and against – a specific component have considerable weight.
Today we’re going to specifically examine the case for the application as the primary driver of scalability in cloud computing environments.
ANSWER: THE APPLICATION
The first and most obvious answer is that the application should drive scalability. After all, the application knows when it is overwhelmed, right? Only the application has the visibility into its local environment that indicates when it is reaching capacity with respect to memory and compute and even application resource. The developer, having thoroughly load tested the application, knows the precise load in terms of connections and usage patterns that causes performance degradation and/or maximum capacity and thus it should logically fall to the application to monitor resource consumption with the goal of initiating a scaling event when necessary.
In theory, this works. In practice, however, it doesn’t. First, application instances may be scaled-up in deployment. The subsequent increase in memory and/or compute resources changing the thresholds at which the application should scale. If the application is developed with the ability to set thresholds based on an external set of values this may still be a viable solution. But once the application recognizes it is approaching a scaling event, how does it initiate that event? If it is integrated with the environment’s management framework – say the cloud API – it can do so. But to do so requires that the application be developed with an end-goal of being deployed in a specific cloud computing environment. Portability is immediately sacrificed for control in a way that locks in the organization to a specific vendor for the life of the application – or until said organization is willing to rewrite the application so that it targets a different management framework.
Some day, in the future, it may be the case that the application can simply send out an SoS of sorts – some interoperable message that indicates it is hovering near dangerous utilization levels and needs help. Such a message could then be interpreted by the infrastructure and/or management framework to automatically initiate a scaling event, thus preserving control and interoperability (and thus portability). That’s the vision of a stateless infrastructure, but one that is unlikely to be an option in the near future.
At this time, back in reality land, the application has the necessary data but is unable to practically act upon it.
But let’s assume it could, or that the developers integrated the application appropriately with the management framework such that the application could initiate a scaling event. This works for the first instance, but subsequently becomes problematic – and expensive. This is because visibility is limited to the application instance, not the application as a whole – the one which is presented to the end-user through network server virtualization enabled by load balancing and application delivery. As application instance #1 nears capacity, it initiates a scaling event which launches application instance #2. Any number of things go wrong at this point:
- Application instance #1 continues to experience increasing load while application instance #2 launches, causing it to (unnecessarily) initiate another scaling event, resulting in application instance #3. Obviously timing and a proactive scaling strategy is required to avoid such a (potentially costly) scenario. But proactive scaling strategies require historical trend analysis, which means maintaining X minutes of data and determining growth rates as a means to predict when the application will exceed configured thresholds and initiate a scaling event prior to that time, taking into consideration the amount of time required to launch instance #2.
- Once two application instances are available, consider that load continues to increase until both are nearing capacity. Both applications, being unaware of each other, will subsequently initiate a scaling event, resulting in the launch of application instance #3 and application instance #4. It is likely the case that one additional instance would suffice, but because the application instances are not collaborating with each other and have no control over each other, it is likely that growth of application instances will occur exponentially assuming a round robin load balancing algorithm and a constant increase in load.
- Because there is no collaboration between application instances, it is nearly impossible to scale back down – a necessary component to maintaining the benefits of a dynamic data center architecture. The decision to scale down, i.e. decommission an instance would need to be made by…the instance. It is not impossible to imagine a scenario in which multiple instances simultaneously “shut down”, neither is it impossible to imagine a scenario in which multiple instances remain active despite overall application load requiring merely one to sustain acceptable availability and performance.
These are generally undesirable results; no one architects a dynamic data center with the intention of making scalability less efficient or rendering it unable to effectively meet business and operational requirements with respect to performance and availability. Both are compromised by an application-driven scalability strategy because application instances lack the visibility necessary to maintaining capacity at levels appropriate to meet demand, and have no visibility into end-user performance, only internal application and back-end integration performance. Both impact overall end-user performance, but are incomplete measures of the response time experienced by application consumers.
To sum up, the “application” has limited data and today, no control over provisioning processes. And even assuming full control over provisioning, the availability of only partial data leaves the process fraught with potential risk to operational stability.