Follow these tips to plan and execute an effective disaster recovery plan.
By: Alan Arnold, EVP and CTO at Vision Solutions
The vast majority of disaster movies depict the overtly dramatic fallout of catastrophes, both natural and manmade, such as entire city skylines going up in flames or the earth ripping open to swallow every object and human being in its path. What they overlook are the quieter, yet just as damaging events; such as the decimation of entire data centers.
Much like disaster movies have been around for ages, disaster-related IT system outages are nothing new. However, the severity of their consequences has soared to epic heights in the age of rapidly mounting data volumes. Losing a “small” percentage of data is no longer a “small” matter, as losing even a few pieces of information can affect multiple business units – and in some cases, relationships with customers.
Given the havoc that data loss can wreak, it is natural to assume that most companies would create and test their business continuity plans – and make disaster recovery (DR) a priority. Systems failure has certainly affected significant number of businesses. Of the 3,076 respondents from across the globe who participated in Vision Solutions 2015 State of Resilience Report, 48 percent reported that their organization had experienced a failure requiring DR to resume IT operations. Yet, astoundingly, 87 percent of respondents either had no DR plan or were not entirely confident the plan was complete, tested and ready.
Factor in the heavy cost of downtime, and this lack of an effective business continuity plan seems even more self-destructive. IT operations are typically down one to two hours due to a failure. However, 57 percent of survey respondents reported downtime exceeding one hour, nearly a third of organizations lost a few hours of data and roughly a quarter lost more than a day of data. Of the respondents who indicated that their company performed a down-time cost analysis, 31 percent reported costs of more than $10,000.
While the United States government deemed September National Disaster Preparedness Month, savvy institutions should make creating, perfecting and testing their business continuity plan a year round practice – or risk standing by powerless as revenue and customer confidence drains away in the wake of a major disruption.
Preparedness doesn’t have to be a tedious or overwhelming process. Create a checklist for an effective business continuity plan using the steps below to secure the future of your data and the health of your company’s finances.
Form a Business Continuity Team
While companies should make all employees aware of its business continuity plan and involve them in testing, business executives should also assign creation, testing and execution projects to dedicated staff. Since a DR plan is, in most cases, one piece of an overall business resilience strategy, the IT leader will be part of the overall team. But they should not stand idly by as another cog in the system; instead, the IT leader should step up and act as a quarterback of sorts, actively pursuing opportunities to demonstrate the plan’s value to business executives.
This distinction is important for a few reasons:
First, leadership needs to support business continuity plans from the top down. Senior management should get involved during both the creation process and during decision-making on any proposed improvements. Management needs to demonstrate a willingness to review and test the plan.
Second, management should promote awareness of the plan and reinforce its importance across the company. Perception can make or break the execution of even the most meticulously designed plan, so employees must grasp the plan’s gravity and the impact of system failure and downtime on productivity. IT executives who communicate regularly to leadership about the availability of a business continuity plan will already have an advantage when a need arises to implement the plan swiftly, and empowering employees to understand that there are steps in place to navigate disaster by protecting they business-critical data they access everyday speaks volumes about a company’s commitment to business continuity, overall.
Finally, every IT employee in charge of a server should play an active role in the planning and execution process, as the company must move all servers over in order to keep the business running. This process can become elaborate, as some servers may function on different operating systems or live in different databases. A platform-agnostic solution comes in handy in this scenario, facilitating movement of servers across all operating systems to avoid performance issues.
Select a Solution that is Up to the Task
In fact, a platform- and storage-agnostic solution offers several benefits that can bolster a well-devised business continuity plan. Very few organizations run all of their operations on a physical or virtual server; the majority split operations between the two environments for optimal infrastructure efficiency. In an emergency, it is imperative that a DR solution offers users the ability to migrate from one type of server to another, or the effort will likely be in vain. An agnostic solution also allows users to migrate between virtual machines with different kinds of chips – for example, from a machine equipped with an AMD chip to one that uses an Intel CPU chip, or from a particular manufacturer’s storage to another.
Additionally, agile tools help users avoid vendor lock-ins, which can literally “trap” valuable data on compromised software or equipment. Vendor lock-ins can render even the best disaster recovery plan on earth powerless.
Data Recovery by the Second
When it comes to adequate data recovery, it is not enough to use snapshot technology, which records the current state of data every half hour to hour. There is simply too much new data moving through company systems now. The best solution is continuous data protection, which allows users to “time travel” and access files before they were lost, then copy and paste them into the current state so that operations can continue nearly uninterrupted. A continuous data solution records data in real time, offering a fine granularity to guard against loss.
This is critical in industries such as banking, where institutions process transitions every moment of the day. And in healthcare, an oversight in data recovery can literally become a life-or-death situation. At the very least, it can cause chaos. Consider a hospital that loses all of its appointment data in a glitch, resulting in a slew of unhappy patients and frazzled staff. When secondary backup systems fail, deletions and errors can result in permanent data loss, as well as long-term reputational issues that can be difficult to rectify based on the volume and business value of the information that vanished.
Get your Data Priorities Straight
The aforementioned example illustrates why it is important for each organization to determine its own priorities in terms of the data it needs to protect most. Some types of data, such as email, are universally important. Users will be furious, frustrated and stressed out if email servers go down for an hour. Conversely, a server that runs monthly finance for the board is less urgent because it does not generate data as frequently as an email server.
While not all data is created equal, it all deserves attention within the scope of your plan to determine appropriate backup parameters. The business continuity team should closely examine data that is specific to the enterprise and industry, and figure out how to prioritize them for recovery. There is no wrong or right equation; it depends entirely on the individual organization and its workflow.
Too Much Testing is Never Enough
Companies should consistently test their business resilience strategy – not so much to check the hardware, as many would assume, but primarily to put employees through drills to make sure they know how to execute the plan appropriately. The human element of disaster recovery is often the trickiest to master, as it requires a great deal of shepherding and coordinating. And it is sometimes the simplest details that can go awry.
Testing is often the moment where unanticipated issues surface, making regular testing important for the success of your individual plan. In a recent example, one company turned all of the power in its data center off and then initiated a role swap, only to discover a major problem: the team was locked out of the data center and therefore was unable to complete the role swap with a required manual step. The battery that powered the physical security card keys was low, so the entire organization was stuck waiting on a counsel. Fortunately, IT was able to approve the manual step once it gained re-entry, but this example reinforces why drills are important.
Testing every six months is ideal, but testing annually is acceptable in an environment where there is a decent level of employee stability. Regardless of frequency, companies should never underestimate the value of testing.
Putting it All Together
Making a well-reviewed disaster movie is not an easy feat, but a well-planned and executed business continuity plan is within reach if organizations carefully follow the tips above. And effective disaster recovery of valuable data leads to business benefits that will earn rave reviews from employees across the enterprise as well as customers and investors in the market at large.