– Henry Hu, Marketing Manager – Power, at EmersonNetwork Power, Liebert Services, says:

Organizations have become increasingly dependent on information technology systems to support business-critical applications, and therefore, the reliability of critical power systems supporting the data center is paramount. While there are many threats to data center availability inside the critical power system, specific steps can be taken to minimize your risk.

High Cost of Downtime

Downtime costs result from several factors, including data loss or corruption, productivity losses, equipment damage, root-cause detection and recovery actions, legal and regulatory repercussions, revenue loss, and long-term repercussions on reputation and trust among key stakeholder. When Emerson Network Power analyzed the financial impact of infrastructure vulnerability, we found that the average cost of data center downtime is approximately $5,600 per minute. With an average reported incident length of 90 minutes, the average cost of a single downtime event can easily exceed $500,000. Depending on the business, the cost can shoot even higher. For example, when Twitter suffered several downtime events last summer, an informal estimate by Constellation Research reported downtime costs of up to $25 million per minute. 

Additionally, a variety of incidents can cause disruption to data center operations. Although weather-related causes account for only 12 percent of outages, Hurricane Sandy demonstrated that disasters can happen anytime, anywhere. The devastation wrought by Hurricane Sandy has focused the IT industry’s attention on the need for remote backup sites and redundant systems. Just like a primary data center location, it is important to ensure these backup systems are maintained routinely to avoid downtime during emergencies, as well as during normal operations.

Criticality of Preventive Maintenance

While UPS systems are designed to offer power stability and protection, they are not failure proof. Factors such as application, installation, design, component manufacturing, real-world operating conditions and maintenance practices can impact the reliability and performance of these systems.

Fortunately, a program of scheduled preventive maintenance (PM) and proactive replacement of key components on the UPS and batteries can greatly reduce the chances for failure during power outages, utility spikes, switching transients, incidents of line noise and other unexpected power related issues.

In a study of the impact of PM on UPS reliability the importance of regular service is clear. This study revealed that the Mean Time Between Failures (MTBF) for units that received two PM service visits a year is 23 times better than a UPS with no PM visits. According to the study, reliability continued to steadily increase with additional visits when these visits were conducted by highly trained technicians.

Recommendations to Avoid Downtime

To avoid power-related downtime, Liebert Services recommends taking these measures:

  1. Manage proactively for uptime. Don’t wait until you have a failure. Be proactive by implementing a program of regular service and maintenance now. You don’t wait until your car breaks down on the road to get an oil change, so don’t wait until you incur thousands or hundreds of thousands in downtime costs before having your UPS serviced.
  2. Schedule at least two PM visits each year. Based on the study referenced above, UPS reliability increases as the frequency of PM goes up. So while two visits is a minimum for data centers that require high-nines availability, a high return on investment can be realized in many cases by increasing the PM frequency. Small, single-phase UPS devices are another exception to the recommended two-visit rule and generally need only one PM per year. Semi-Annual Service Tasks…Typical tasks performed during a semi-annual service visit include a complete visual inspection of the equipment, including subassemblies, wiring harnesses, contacts, cables, and all breakers, as well as checking air filters for cleanliness. At the end of this service, a system operational test including unit transfer and battery discharge should be performed.
  3. Proactively replace UPS capacitors and other life-limited components. Capacitors and batteries age, as do other components such as fans and blowers. Again, think about driving your car. If you want to be safe and avoid accidents, you don’t have to eliminate all signs of tread before replacing your tires or wait until the brake pads start to scratch your disc brakes. Although high quality capacitor manufacturers provide a service life rating, the rating is at best, a guideline. The number isn’t accurate enough to predict when the first capacitor in a large population will fail. Therefore, replacing capacitors periodically is a recommended practice to ensure a higher MTBF. 
  4. Service components that indirectly affect power. Along with the mission-critical power system, other mechanical systems require preventive maintenance to ensure optimum performance. It’s particularly important to carry out preventive maintenance activities for the cooling infrastructure. This includes inspection and replacement of air filters. Clogged air filters reduce the airflow through the systems and increase the load on the blower drive system, which should also be inspected and maintained. Wear or damage to blower belts, bearings, motors and wheels may result in reduced cooling performance. Evaporator coils, reheat elements and condenser coils should also be checked periodically to verify they are free of debris. An overheated power system is a vulnerable power system. 

In summary, there are many threats lurking in the power system that can take down a data center. To significantly lower your risk and avoid the heavy financial burden of downtime, implement a regular, comprehensive PM program where service is performed by highly experienced technicians.