By Co-Authors: Michael Schumacher and Mara Ervin

In a scenario no data center ever wants to face, a brief voltage dip in the power supply system causes chillers to shut down. As temperatures rise, compute and storage units must be powered off to prevent hardware damage. This exact scenario unfolded at a Microsoft data center in Australia in 2023, resulting in a three-day disruption, significant financial penalties, and considerable reputational damage.

While Microsoft has not officially categorized this event as a cyber incident, the unusual sequence of failures — such as all operational chillers faulting simultaneously and pumps not receiving restart signals — strongly suggests the possibility of an OT (Operational Technology) cyber-attack. This was not an isolated incident. Over the past year, there have been significant physical security breaches at both data centers and substations across the United States. These incidents underline the increasing need for robust physical security protocols in parallel with cybersecurity measures. Notable examples include breaches at substations in the Pacific Northwest, where intruders damaged infrastructure and disrupted services, as well as attempted and successful break-ins at data centers in the Midwest, resulting in equipment tampering and data loss risks. Another incident in 2023 involved two individuals charged with conspiring to attack multiple energy substations to cripple Baltimore, Maryland’s power grid. Driven by extremist ideology, the two allegedly planned to cause widespread disruption by targeting key substations, according to U.S. Attorney Erek Barron and the FBI. Russell, believed to have provided instructions and details on the substations, reportedly called such attacks on power transformers “the greatest thing somebody can do,” emphasizing their intent to inflict maximum damage on the city.

Such breaches highlight vulnerabilities in physical security and reinforce the need for enhanced surveillance, access control, and incident response strategies to prevent unauthorized access that could lead to both physical and operational disruptions. Industry reports indicate that approximately 10% of data center outages are now caused by cybersecurity breaches, making it the fastest-growing threat to data center operations. The financial impact is staggering: one hour of downtime in a data center represents an estimated $1,000 revenue loss per 1MW of capacity. Additionally, breaching service level agreements (SLAs) can incur fines amounting to hundreds of thousands of dollars per incident.

Given these high stakes, it’s crucial to explore how such events can be quickly detected before causing damage, revenue loss, or penalties — and, more importantly, how similar attacks can be prevented in the future.

New Projects, Old Problems

The rapid adoption of AI has fueled a surge in data center investments, often described as a modern-day “Gold Rush.” Today, data centers consume 2-3% of the EU’s power supply, with projections estimating this could rise to 7-8% by 2030. However, this growth is paralleled by increasingly audacious cyberattacks, orchestrated by well-organized criminal groups and nation-state actors.

Generative AI, the dark web, and the deep pockets of state-funded entities have accelerated the evolution of malicious attack vectors. Staying informed on the latest cyber threats is a Herculean task, let alone adjusting defenses to counter them effectively.

AI has also contributed to the rapid growth of the colocation sector, driven by increasing demands from businesses looking to reduce operational costs, improve data management, and scale efficiently. To meet these demands, colocation providers are strengthening their Service Level Agreements (SLAs) to guarantee high uptime, robust security, and data protection against both cyber and physical threats. SLAs increasingly encompass stringent clauses addressing data integrity and protection against outages, cyber breaches, and physical disruptions. Providers are now investing in advanced infrastructure, redundancy mechanisms, and proactive monitoring systems to ensure compliance with these SLAs, thereby safeguarding client data and mitigating risks associated with data center downtime. Failure to meet these SLA commitments can lead to substantial financial penalties, as clients often have compensation clauses that enforce significant payouts for service lapses. Non-compliance can not only affect revenue through fines but also damage a provider’s reputation, impacting future business opportunities in an increasingly competitive colocation market.

Data centers depend on a network of interconnected OT systems, including power management and physical security controls. These systems are integrated with IT networks for monitoring and control, which unfortunately broadens the attack surface. Adding to the challenge, operational staff often lack cybersecurity training, focusing instead on reliability and efficiency. This knowledge gap can lead to poor security practices, such as using weak passwords and neglecting system patches, making them easy targets for adversaries.

Even attempts to mitigate risks, like “air-gapping” operational control systems from the business network, have limitations. Vulnerabilities persist through the use of removable media, such as USB devices for software installations or data transfers, which expose systems to potential viruses. Additionally, network-isolated systems often integrate with business applications, services, and vendor management systems, creating potential backdoors for cyberattacks.

Prevention Is Not Enough

OT cybersecurity professionals are certainly aware of these risks, but the disproportionate focus on prevention leaves fewer resources for incident response.

Take the example of equipment monitoring. Human-Machine Interfaces (HMIs) are crucial for managing various OT systems within a data center, including power management, cooling, and security systems. However, these control systems are not immune to sophisticated attacks. A skilled hacker can manipulate OT control systems through false data injections — misleading information that tricks systems into taking incorrect actions, as seen in the notorious Stuxnet attack. Despite network isolation measures, attackers can gain access through vulnerabilities in integrated systems, supply chain weaknesses, or insider threats. Once inside, they can inject false data that simulates normal conditions or triggers false alarms, manipulating system responses at the process level.

Although Microsoft has not confirmed that the shutdown in its Australian data center was due to a false data injection or another type of OT cyber-attack, the absence of any detected suspicious behavior in the control systems raises a red flag. The incident fits the profile of a false data injection attack, underscoring the need for more robust defenses.

From Prevention to Prompt Response: The Role of Level Zero Visibility

Given the growing complexity of cyber threats, data center cybersecurity plans must evolve beyond mere prevention. They should assume constant attack attempts and anticipate that some breaches will inevitably succeed. This requires a shift from solely preventive measures to a more balanced approach that includes prompt response and continuous monitoring.

However, this presents a catch-22: how can cybersecurity teams rely on HMI monitoring of physical assets if these tools have been compromised? If cyber-attackers manipulate HMI reporting, decisions about shutting down equipment or continuing operations become educated guesses at best.

This is where Level Zero visibility comes into play. Level Zero visibility provides an unfiltered view of physical assets — data that cannot be compromised by attackers. It goes beyond simply identifying whether predefined control limits have been breached, such as significant temperature rises. Instead, it detects anomalous behavior within control limits, such as unexplained fluctuations in temperature.

Data center cybersecurity teams should operate under the assumption that they are constantly under attack. Therefore, Level Zero visibility is essential during all stages of a cyber-attack:

Before an Event: Detecting anomalous behavior between control systems and physical assets can reveal cyber threats before they escalate.

During an Event: Unfiltered Level Zero visibility is critical for making informed decisions on whether to shut down equipment, disconnect from networks, or continue operations with caution.

After an Event: In the aftermath, Level Zero visibility is invaluable for forensic analysis, helping to understand the attack’s progression, identify attack vectors, and develop strategies to prevent future incidents.

Conclusion

As the construction of data centers accelerates, so does the risk of cyber-attacks. The explosive growth within the AI infrastructure and colocation market amplifies these challenges, as more interconnected systems introduce additional vulnerabilities. Multi-level OT cyber defenses that incorporate Level Zero visibility and advanced technologies like Machine Learning are essential for protecting these critical infrastructures. By integrating real-time, unfiltered monitoring of physical assets, data centers can make more informed decisions, safeguarding not just their operations but also the safety and well-being of their employees. Moreover, as AI applications require robust data handling capabilities, ensuring the security and resilience of these systems becomes paramount to support the increasing demand for reliable and secure services in the colocation landscape.

Meet Michael “Schu” Schumacher 

Michael “Schu” Schumacher is the VP of Sales North America for SIGA and is the father of four and (to date) grandfather of five. He has spent the past 30+ years in IT Security sales leadership, having worked for network carriers, solution sales and IT/OT security companies. “Schu” – as everyone calls him – has worked the last 12 years as a regional and national sales leader for both established and startup companies specializing in IT and OT Security solutions. He possesses a vast technological background that ranges from securing networks to protecting critical infrastructure and manufacturing processes. With a strong desire to listen and learn and an excellent reputation for “delivering on promises made”, Schu utilizes his broad background and knowledge base to work as a Subject Matter Expert/SME for both securing and providing rapid response data for layers 0 through 5 in the Purdue model.This expertise helps IT and OT converge on their shared objectives – protecting and delivering on their production and security goals.