Cloud Outage Prevention: Taking Humans Out of the IT Equation
– Jonathan Crane, Chief Commercial Officer, IPsoft, says:
Cloud architecture utilization has come to demand an unmanageable speed of responsiveness and high agility, putting tremendous pressure on the management of the infrastructure environment. And while cloud management processes have slowly improved, one glaring problem still plagues cloud performance: human error.
Mistakes are inevitable when people handle important cloud and IT processes, but the rate of human error appears to be rising beyond an acceptable rate. For example, the recent Blackberry server outage and a number of other cloud outages were instigated by human error. These examples demonstrate just how devastating mistakes can be, resulting in poor end-user experiences, revenue loss and damage to a company’s reputation.
The problem is that the current mindset in cloud management seems to be one of retroactive fire drills rather than proactive prevention. Instead of reacting to the disastrous results of human error, IT departments should be pushing for the minimization, if not complete elimination, of cloud issues resulting from human error.
Many companies are implementing automation to help manage basic cloud management and IT functions. These tools come in two flavors: traditional automation, which relies on a tree-based logic system, and autonomic expert systems, which are based on self-learning principles. Traditional automation follows a pre-programmed formula based on set conditions and works well when the same process is repeated often.
Autonomic expert systems also help eradicate human error, as these tools can track and mimic the work of human engineers to eliminate human involvement in up to 70 percent of level 1 IT tasks and 30-40 percent of level 2 IT tasks. That’s the equivalent of about four days of work each week. Employing expert systems leads to reduced IT costs and improved scalability, flexibility and compliance in an enterprise’s cloud environment. Because autonomics can perform routine IT processes much more reliably and efficiently than humans, implementing autonomics often means drastic reductions in latency and mean time to resolution (MTTR) for downtime. Autonomics also result in more consistent business-related outcomes and allow engineers to focus on more creative pursuits instead of mundane, repetitive tasks.
Without the shift to expert systems, human error will continue to impede cloud and IT performance. Already, companies that implement autonomic expert systems are removing humans from 40 and up to 80 percent of IT operational tasks, including cloud management. Clearly, there is room for growth with these statistics, and, over the next few years, I think we will see an increase in the prevalence of autonomic expert systems.