Previously Rob was Chief Architect at Infineon Technologies, responsible for the TriCore family of processors used in automotive, communication and security products. In addition, he drove improvements in semiconductor methodology, libraries, process and the mobile phone platforms. Rob was manager of Newton Technologies at Apple Computer and was involved in the creation of the PowerPC Macintosh computers, PowerPC, StrongARM and ARC processors. He also has experience in development of CDC, CRAY and SPARC supercomputers, mainframes and high-speed networks, and he has dozens of patents in mobility, computing and processor architecture. Rob has an honors Bachelor of Applied Science (BASc.) in Systems Design Engineering from the University of Waterloo in Ontario, Canada.
Mega Datacenters: Pioneering the Future of IT Infrastructure
– Rob Ober, LSI Fellow, Processor and System Architect, LSI Corporate Strategy Office , says:
The unrelenting growth in the volume and velocity of data worldwide is spurring innovation in datacenter infrastructures, and mega datacenters (MDCs) are on the leading edge of these advances. Although MDCs are relatively new, their exponential growth – driven by this data deluge – has thrust them into rarefied regions of the global server market: they now account for about 25 percent of servers shipped.
Rapid innovation is the watchword at MDCs. It is imperative to their core business and, on a much larger scale, forcing a rethinking of IT infrastructures of all sizes. The pioneering efforts of MDCs in private clouds, compute clusters, data analytics and other IT applications now provide valuable insights into the future of IT. Any organization stands to benefit by emulating MDC techniques to improve scalability, reliability, efficiency and manageability and reduce the cost of work done as they confront changing business dynamics and rising financial pressures.
The Effects of Scale at MDCs
MDCs and traditional datacenters are miles apart in scale, though the architects at each face many of the same challenges. Most notably, both are trying to do more with less by implementing increasingly sophisticated applications and optimizing the investments needed to confront the data deluge. The sheer scale of MDCs, however, magnifies even the smallest inefficiency or problem. Economics force MDCs to view the entire datacenter as a resource pool to be optimized as it delivers more services and supports more users.
MDCs like those at Facebook, Amazon, Google and China’s Tencent use a small set of distinct platforms, each optimized for a specific task, such as storage, database, analytics, search or web services. The scale of these MDCs is staggering: Each typically houses 200,000 to 1,000,000 servers, and from 1.5 million to 10 million disk drives. Storage is their largest cost. The world’s largest MDCs deploy LSI flash cards, flash cache acceleration, host bus adapters, serial-attached SCSI (SAS) infrastructure and RAID storage solutions, giving LSI unique insight into challenges these organizations are facing, and how they are pioneering various architectural solutions to common problems.
MDCs prefer open source software for operating systems and other infrastructure, and the applications are usually self-built. Most MDC improvements have been given back to the open source community. In many MDCs, even the hardware infrastructure might be self-built or, at a minimum, self-specified for optimal configurations – options that might not be available to smaller organizations.
Server virtualization is only rarely used in MDCs. Instead of using virtual machines to run multiple applications on a single server, MDCs prefer to run applications across clusters consisting of hundreds to thousands of server nodes dedicated to a specific task. For example, the server cluster may contain only boot storage, RAID-protected storage for database or transactional data, or unprotected direct-map drives with data replication across facilities depending on the task or application it is performing. MDC virtualization applications are all open source. They are used for containerization to simplify the deployment and replication of images. Because re-imaging or updating virtualization applications occurs frequently, boot image management is another challenge.
The large clusters at MDCs make the latency of inter-node communications critical to application performance, so MDCs make extensive use of 10Gbit Ethernet in servers today and, in some cases, they even deploy 40Gbit infrastructure as needed. MDCs also optimize performance by deploying networks with static configurations that minimize transactional latency. And MDC architects are now deployingat least some software defined network (SDN) infrastructure to optimize performance, simplify management at scale and reduce costs.
To some, MDCs are seen as cheap, refusing to pay for any value-added functionality from vendors. But that’s a subtle misunderstanding of their motivations. With as many as 1 million servers, MDCs require a lights-out infrastructure maintained primarily by automated scripts and only a few technicians assigned simple maintenance tasks. MDCs also maintain a ruthless focus on minimizing any unnecessary spending, using the savings to grow and optimize work performed per dollar spent.
MDCs are very careful to eliminate features not central to their core applications, even if provided for free, since they increase operating expenditures. Chips, switches and buttons, lights, cables, screws and latches, software layers and anything else that does nothing to improve performance only adds to power and cooling demands and service overhead. The addition of one unnecessary LED in 200,000 servers, for example, is considered an excess that consumes 26,000 watts of power and can increase operating costs by $10,000 per year.
Even minor problems can become major issues at scale. One of the biggest operational challenges for MDCs is HDD failure rates. Despite the low price of hard disk drives (HDDs), failures can cause costly disruptions in large clusters, where these breakdowns are routine. Another challenge is managing rarely-used archival data that may exceed petabytes and is now approaching exabytes of online storage, consuming more space and power while delivering diminishing value. Every organization faces similar challenges, albeit on a smaller scale.
Lessons Learned from Mega Datacenters
Changing business dynamics and financial pressures are forcing all organizations to rethink the types of IT infrastructure and software applications they deploy. The low cost of MDC cloud services is motivating CFOs to demand more capabilities at lower costs from their CIOs, who in turn are turning to MDCs to find inspiration and ways to address these challenges.
The first lesson any organization can learn from MDCs is to simplify maintenance and management by deploying a more homogenous infrastructure. Minimizing infrastructure spending where it matters little and focusing it where it matters most frees capital to be invested in architectural enhancements that maximize work-per-dollar. Investing in optimization and efficiency helps reduce infrastructure and associated management costs, including those for maintenance, power and cooling. Incorporating more lights-out self-management also pays off, supporting more capabilities with existing staff.
The second lesson is that maintaining five-nines (99.999%) reliability drives up costs and becomes increasingly difficult architecturally as the infrastructure scales. A far more cost-effective architecture is one that allows subsytems to fail, letting the rest of the system operate unimpeded and the overall system self-heal. Because all applications are clustered, a single misbehaving node can degrade the performance of the entire cluster. MDCs take the offending server off line, enabling all others to operate at peak performance. The hardware and software needed for such an architecture are readily available today, enabling any organization to emulate this approach. And though the expertise needed to effectively deploy a cluster is still rare, new orchestration layers are emerging to automate cluster management.
Storage, one of the most critical infrastructure subsystems, directly impacts application performance and server utilization. MDCs are leaders in optimizing datacenter storage efficiency, providing high-availability operation to satisfy requirements for data retention and disaster recovery. All MDCs rely exclusively on direct-attached storage (DAS), which carries a much lower purchase cost, is simpler to maintain and delivers higher performance than a storage area network (SAN) or network-attached storage (NAS). Although many MDCs minimize costs by using consumer-grade Serial ATA (SATA) HDDs and solid state drives (SSDs), they almost always deploy these drives on a SAS infrastructure to maximize performance and simplify management. More MDCs are now migrating to large-capacity, enterprise-grade SAS drives for higher reliability and performance, especially as SAS migrates from 6Gbit/s to 12Gbit/s bandwidth.
When evaluating storage performance, most organizations focus on I/O operations per second (IOPs) and MBytes/s throughput metrics. MDCs have discovered that applications driving IOPs to SDDs quickly reach other limits though, often peaking well below 200,000 IOPs, and that MBytes/s performance has only a modest impact on work done. A more meaningful metric is I/O latency because it correlates more directly with application performance and server utilization – the very reason MDCs are deploying more SSDs or solid state caching (or both) to minimize I/O latency and increase work-per-dollar.
Typical HDD read/write latency is on the order of 10 milliseconds. By contrast, typical SSD read and write latencies are around 200 microseconds and 100 microseconds, respectively – about five orders of magnitude lower. Specialized PCIe® flash cache acceleration cards can reduce latency another order of magnitude to tens of microseconds. Using solid state storage to supplement or replace HDDs enables servers and applications to do four to10 times more work. Server-based flash caching provides even greater gains in SAN and NAS environments – up to 30 times.
Flash cache acceleration cards deliver the lowest latency when plugged directly into a server’s PCIe bus. Intelligent caching software continuously and transparently places hot data (the most frequently accessed or temporally important) in low-latency flash storage to improve performance. Some flash cache acceleration cards support multiple terabytes of solid state storage, holding entire databases or working datasets as hot data. And because there is no intervening network and no risk of associated congestion, the cached data is accessible quickly and deterministically under any workload.
Deploying an all-solid-state Tier 0 for some applications is also now feasible, and at least one MDC uses SSDs exclusively. In the enterprise, decisions about using SSDs usually focus on the storage layer, and cost per GByte or IOPs, pitting HDDs against SSDs with an emphasis on capital expenditure. MDCs have discovered that SSDs deliver better price/performance than HDDs by maximizing work-per-dollar investments in other infrastructure (especially servers and software licenses), and by reducing overall maintenance costs. Solid state storage is also more reliable, easier to manage, faster to replicate and rebuild, and more energy-efficient than HDDs – all advantages to any datacenter.
Pioneering the Datacenter of the Future
MDCs have been driving open source solutions with proven performance, reliability and scalability. In some cases, these pioneering efforts have enabled applications to scale far beyond any commercial product. Examples include Hadoop® for analytics and derivative applications, and clustered query and databases applications like Cassandra™ and Google’s Dremel. The state-of-the-art for these and other applications is evolving quickly, literally month-by-month. These open source solutions are seeing increasing adoption and inspiring new commercial solutions.
Two other, relatively new initiatives are expected to bring MDC advances to the enterprise market, just as Linux® software did. One is OpenCompute, which offers a minimalist, cost-effective, easy-to-scale hardware infrastructure for compute clusters. Open Compute could also foster its own innovation, including an open hardware support services business model similar to the one now used for open source software. The second initiative is OpenStack® software, which promises a higher level of automation for managing pools of compute, storage and networking resources, ultimately leading to the ability to operate a software defined datacenter.
A related MDC initiative involves disaggregating servers at the rack level. Disaggregation separates the processor from memory, storage, networking and power, and pools these resources at the rack level, enabling the lifecycle of each resource to be managed on its own optimal schedule to help minimize costs while increasing work-per-dollar. Some architects believe that these initiatives could reduce total cost of ownership by a staggering 70 percent.
Maximizing work-per-dollar at the rack and datacenter levels is one of the best ways today for IT architects in any organization to do more with less. MDCs are masters at this type of high efficiency as they continue to redefine how datacenters will scale to meet the formidable challenges of the data deluge.
About the Author
Robert Ober is an LSI Fellow in Corporate Strategy, driving LSI into new technologies, businesses and products. He has 30 years of experience in processor and system architecture. Prior to joining LSI, Rob was a Fellow in the Office of the CTO at AMD, with responsibility for mobile platforms, embedded platforms and wireless strategy. He was one of the founding Board members of OLPC ($100 laptop.org) and was influential in its technical evolution, and was also a Board Member of OpenSPARC.