According to the Ponemon Institute’s latest “Cost of Data Center Outages” study, the average cost of an unplanned data center outage has steadily increased from approximately $500,000 in 2010 to nearly $750,000 in 2016, a 38 percent increase. What’s also intriguing about the report is that data center outages aren’t just costly, they’re extremely common. Total outages occurred once a year on average and device-level outages occurred every two months among the U.S.-based data centers surveyed.
As we let these figures sink in, and before answering what’s wrong with the Rack Power Circuit Monitoring Chart presented above, let me provide some background about this graphic:
- This is a one-week chart graphing the power draw for two redundant circuits (RPP_4/6_3A-25,27,29 and RPP_6/4_1A-25,27,29) in a single rack #15
- All servers in this rack are dual powered and equally plugged to both power circuits
- The circuits are 30amp 3phase 208v
- The RPP_6/4 circuit has peaked repeatedly at ~12.7 amps
- The RPP_4/6 circuit has peaked concurrently at ~12.2 amps
First, an Observation from this Picture
Although the servers in this rack are equally plugged into both circuits they are not drawing power equally from both circuits.
What Is Wrong with This Picture?
A 30amp 208v circuit breaker by US code de-rated to 80% can safely supply 24amps. However, the combined load on the two circuits at peak is approaching 25amps. Though neither circuit is oversubscribed in normal operating conditions, if one circuit fails its load will cascade to the other circuit, which will then become oversubscribed and be at risk of create a cascading failure.
What Are the Takeaways from This Illustration?
It is increasingly common for server power supplies to draw power disproportionately from A and B power sources. Therefore, you cannot simply set thresholds at 50% or 40% of a power source’s capacity, because false alerts will be issued when one power source is delivering a disproportionate amount of the load for a server(s). Rather, the A and B power sources must be aggregated and compared to a breakers capacity threshold to determine if the circuits in a rack are at risk of a cascading failure.
Intelligent power monitoring software is needed at each rack to aggregate A and B power sources and give alerts only when actual oversubscription occurs.
To return to the alarming statistics presented in the Ponemon survey, the study also found that average data center downtime was 90 minutes, resulting in an average cost per incident of more than $1 million in data centers that comprise the core of an organization’s business operations. With economic stakes this high, to say nothing of the loss of a company’s business reputation, if you’re not monitoring your racks with intelligent power monitoring software, then one of two things about your data center is true and neither is good.
You are either wasting much of your power infrastructure investment by severely underutilizing your power capacities to protect against cascading power failures, or you are at risk of cascading power failures and don’t know it!