InfiniBand

Dr. David Southwell, CVO, Obsidian Strategics, says:

It has become commonplace, though many people remain unaware, for products and devices originally developed or tested by NASA to become part of our everyday lives. Cell phone cameras, artificial limbs, invisible braces, baby formula and cordless tools are just a few of the many examples. In a similar vein, some technology currently residing within supercomputers may turn out to be strangely relevant to the architecture of new data centers.

Current Supercomputer Configuration

Commodity silicon-based designs long ago replaced classics like the Cray vector supercomputer; the vast majority of supercomputers today are huge clusters of servers lashed together with high-performance networks.  Built for massively parallel large-scale simulations, the application work load is distributed across the server nodes which coordinate via messages passed across their shared communications fabric.  The server nodes usually feature floating point heavy CPUs and GPU-based math accelerators and enjoy large main memories, but they are essentially just Linux servers.

 

Most supercomputers attach their storage to the same communications fabric, as is used for inter-processor communication.  Storage must also be fast and parallel to facilitate large data set loading and also periodic checkpointing to save simulation state in case of a failure.  The interconnect is thus a unified fabric carrying management, compute and storage traffic over a single fiber connection to each node.

 

Reducing the cost per node is typically a key concern in maximizing system-level bang for the buck, ultimately determining the supercomputer’s performance.  For this reason, commodity, standards-based hardware components are preferred.  An open standard called InfiniBand (IB) has been the dominant cluster interconnect since its introduction, with specifications first published by an industry consortium that included Intel, IBM, HP and Microsoft in 1999

Low latency (sub microsecond end-to-end), extreme scalability and high bandwidth (100GBits/s per port) are all important attributes of IB, as well as hardware offload, which includes a very powerful feature called RDMA (Remote Direct Memory Access).  RDMA allows data to flow “zero copy” from one application’s memory space to that residing on another server at wire speed, without the intervention of the OS, or even the CPU, allowing data movement to scale with memory speeds not just CPU core speeds (which have stalled).

An informative primer  InfiniBand is available from The InfiniBand Trade Association here.

How the Data Center Has Changed

A well-balanced server farm design must balance compute, storage and network performance.  Many factors are now conspiring to expose legacy 37-year old TCP/IP Ethernet as the shortest leg on the stool:

  • Since they minimize network adapters, cables and switches, unified fabrics are highly desirable. They improve a host of system-level metrics such as capital costs, airflow, heat generation, management complexity and the number of channel interfaces per host. Micro- and Blade-form-factor servers can ill-afford three separate interfaces per node.  Due to its lossy flow control and high latency, TCP/IP Ethernet is not a good match for high performance storage networks.
  • Current data center workflow requirements, which tend to strongly emphasize East-West traffic, need new fabric topologies. Ethernet-spanning tree limitations preclude efficient implementations such as “fat tree” featuring aggregated trunks between switches.
  • Billions of transistors are used by many core processors to tile tens to hundreds of CPU cores per chip – and server chips are trending strongly in this direction. It is easy to see that networking capability must be proportionately and radically scaled up to maintain architectural balance, or else the cores will be forever waiting on network I/O.
  • Multiple virtual machines can be consolidated onto single physical machines, which has the effect of further multiplying the network performance requirements per socket, pushing towards supercomputer-class loading levels. For instance, a TCP/IP stack running over 1Gb Ethernet could require up to 1GHz worth of CPU, now overlay 20 such machines on a single node and even many core CPUs are saturated by the OS before the application sees a single cycle.
  • Rotating storage is being steadily displaced by Solid State Disks, and not just in early critical applications such as database indexing and metadata storage. Legacy NAS interconnects that were able to hide behind tens of milliseconds of rotating disk latency are suddenly found to be hampering SSDs and their microsecond-range response times.  SSDs also deliver order of magnitude throughput increases, again stressing older interconnects.

InfiniBand is well positioned in every respect to meet these and many other issues, while also offering smooth migration paths. For example, via IPoIB, InfiniBand can carry legacy IP traffic at great speed and, while this does not immediately expose all of the protocol’s benefits, it provides a bridge to more efficient implementations that can be rolled out over time.  Furthermore—and contrary to popular misconception—InfiniBand is actually the most cost-effective protocol in terms of $/Gbits/s of any comparable standards-based interconnect technology, and dramatically so if deployed as a unified fabric.

InfiniBand for an Efficient Future

It’s obvious that the key to efficient future data center implementations may be an open standard supercomputer interconnect. However, does InfiniBand have the depth behind all that raw performance and scale for production deployments?

Certain problems existed with early InfiniBand implementations. They were confined to very short links between racks by the standard’s precise lossless flow control scheme, lacked security mechanisms such as link encryption and were even limited to single subnet topologies.  Led by the requirements of advanced military networks, other early adopter communities and technology innovators that have moved to serve these markets, InfiniBand solutions today provide the means to span global distances over standard optical infrastructure, with strong link encryption and multi-subnet segmentation.  Incidentally, InfiniBand was designed way back in 1999 to employ separate data and control planes and therefore InfiniBand was the original Software Defined Network (SDN). The quick pace of innovation in the supercomputer and supercommunications space will likely provide significant benefit to its adopters.

About the author:

Dr. Southwell co-founded Obsidian Research Corporation. Dr. Southwell was also a founding member of YottaYotta, Inc. in 2000 and served as its director of Hardware Development until 2004. Dr. Southwell worked at British Telecom’s Research Laboratory at Martlesham Heath in the UK, participated in several other high technology start-ups, operated a design consultancy business, and taught Computer Science and Engineering at the University of Alberta. Dr. Southwell graduated with honors from the University of York, United Kingdom, in 1990 with a M.Eng. in Electronic Systems Engineering and a Ph.D in Electronics in 1993 and holds a Professional Engineer (P.Eng.) designation.