Choosing the right size for an instance in the cloud is – by design – like choosing the right mix of entrée and sides off the menu. Do I want a double burger and a small fry? Or a single burger and a super-size fry? All small? All large? Kiddie menu? It all depends on how hungry you are, on your capacity to consume.
Choosing the right instance in cloud computing environments also depends on how hungry your application is, but it also depends heavily on what the application is hungry for. In order to choose the right “size” server in the cloud, you’ve got to understand the consumption patterns of your application. This is vital to being able to understand when you want a second, third or more instance launched to ensure availability whilst simultaneously maintaining acceptable performance.
To that end, it is important to understand the difference between “concurrency” and “connections” in terms of what they measure and in turn how they impact resource consumption.
Connections is the measure of new connections, i.e. requests, that can be handled (generally per second) by a “thing” (device, server, application, etc…). When we’re talking applications, we’re talking HTTP (which implies TCP). In order to establish a new TCP connection, a three-way handshake must be completed. That means three exchanges of data between the client and the “thing” over the network. Each exchange requires a minimal amount of processing. Thus, the constraining factors that may limit connections is network and CPU speed. Network speeds impact the exchange, and CPU speed impacts the processing required. Degradation in either one impact the time it takes for a handshake to complete, thus limiting the number of connections per second that can be established by the “thing.”
Once a connection is established, it gets counted as part of concurrency.
Concurrency is the measure of how many connections can be simultaneously maintainedby a thing (device, server, application, etc… ). To be maintained, a connection must be stored in a state table on the “thing”, which requires memory. Concurrency, therefore, is highly dependent on the amount of RAM available on a “thing” and is constrained by RAM available on a “thing”.
In a nutshell: Concurrency requires memory. New connections per second requires CPU and network speed.
Capitalizing on Capacity
Now, you may be wondering what good that does you to know the difference. First, you should be (if you aren’t) aware of the usage patterns of the application you’re deploying in the cloud (or anywhere, really). Choosing the right instance based on the usage pattern (connections heavy versus concurrent heavy) of the application can result in spending less money over time by choosing the right instance such that the least amount of resources is wasted. In other words, you’re making more efficient the resources being used by pairing it correctly to the right application.
Choosing a high-memory, low-CPU instance for an application that is connection-oriented can lead to underutilization and wasted investment, as it will be need to be scaled out sooner to maintain performance. Conversely, choosing high-CPU, low-memory instances for applications dependent on concurrency will see performance degrade quickly unless additional instances are added, which wastes resources (and money). Thus, choosing the right instance type of the application is paramount to achieving the economy of scale promised by cloud computing and virtualization. This is a truism whether you’re choosing from a cloud provider’s menu or your own.
It’s inevitable that if you’re going to scale the application (and you probably are, if for no other reason than to provide for reliability) you’re going to use a load balancing service. There are two ways to leverage the capabilities of such a service when delivered by an application delivery controller that depend, again, upon the application.
STATEFUL APPLICATION ARCHITECTURE
If the application you are deploying is stateful (and it probably is) then you’ll not really be able to take advantage of page routing and scalability domain design patterns. What you can take advantage of, however, is the combined memory, network, and processing speed capabilities of the application delivery controller.
By their nature, application delivery controllers generally aggregate more network bandwidth, contain more efficient memory usage, and are imbued with purpose built protocol handling functions. This makes them ideal for managing connections at very high rates per second. An application delivery controller based on a full-proxy architecture, furthermore, shields the application services themselves from the demands associated with high connection rates, i.e. network speed and CPU. By offloading the connection-oriented demands to the application delivery service, the application instances can be chosen with the appropriate resources so as to maximize concurrency and/or performance.
STATELSS or SHARED STATE APPLICATION ARCHITECTURE
If it’s the case that the application is stateless or the application shares state (usually via session stored in a database), you can pare off those functions that are connection-oriented from those that are dependent upon concurrency. RESTful or SOA-based application architectures will also be able to benefit from the implementation of scalability domains as it allows each “service” to be deployed to an appropriately sized instance based on the usage type – connection or concurrency.
An application delivery service capable of performing layer 7 (page) routing can efficiently sub-divide an application and send all connection-heavy requests to one domain (pool of resources/servers) and all concurrent-heavy requests to yet another. Each pool of resources can then be comprised of instances sized appropriately – more CPU for the connection-oriented, more memory for the concurrent-oriented.
In either scenario, the use of TCP multiplexing on the application delivery controller can provide further mitigation of the impact of concurrency on the consumption of instance resources, making the resources provisioned for the application more efficient and able to serve more users and more requests without increasing memory or CPU.
What is TCP Multiplexing?
TCP multiplexing is a technique used primarily by load balancers and application delivery controllers (but also by some stand-alone web application acceleration solutions) that enables the device to “reuse” existing TCP connections. This is similar to the way in which persistentHTTP 1.1 connections work in that a single HTTP connection can be used to retrieve multiple objects, thus reducing the impact of TCP overhead on application performance.
TCP multiplexing allows the same thing to happen for TCP-based applications (usually HTTP / web) except that instead of the reuse being limited to only 1 client, the connections can be reused over many clients, resulting in much greater efficiency of web servers and faster performing applications.
Regardless of your strategy, understanding the difference between concurrency and connections is a good place to start determining how best to provision resources to meet specific application needs. Reducing the long-term costs associated with scalability is still important to ensure an efficient IT organization, and utilization is always a good measure of efficiency. The cost differences between CPU-heavy and memory-heavy instances vary from cloud to cloud environment, with the most expensive being those that are high in both CPU and memory. Being able to provision the least amount of resources while achieving optimal capacity and performance, is a boon – both for IT and the business it serves.