SuperComputing 2018

Global Petascale to Exascale Science Workflows Accelerated by Next Generation Software Defined Network Architectures and Applications

 

Global Petascale to Exascale Science Workflows Accelerated by Next Generation Software Defined Network Architectures and Applications

Caltech’s High Energy Physics (HEP) and Network Teams Collaborate with Partners to Break New Ground

Building the Next Generation Software Defined Network (SDN)
Cyber-Architectures and Applications for High Energy Physics,
Astrophysics and Exascale Science

Pasadena, California, November 2018 — During the 2018 Network Research Exhibition (NRE) at the Supercomputing 2018 Conference (SC18) in Dallas earlier this month, Caltech together with university, laboratory, network and industry partners demonstrated the latest developments toward an SDN-driven Next Generation Integrated Architecture (NGenIA) for high energy physics and other global data intensive science domains. While the initial focus on the largest global science program now underway, at CERN’s Large Hadron Collider (LHC) and at 170 computing and storage facilities around the world, and the future Large Synoptic Survey Telescope (LSST) program, now under construction in La Serena, Chile, the methods and developments are widely applicable to many data intensive disciplines.

Some of the key partners who supported the Caltech demonstrations at SC18 include 2CRSI,  AmLight/FIU,  Arista Networks,  CENIC,  CERN,  Century Link,  Ciena,  Colorado State,  Dell,  ESnet,  Echostreams,  Fermilab,  IBM,  Intel,  Internet2,  KISTI,  LBNL,  Maryland,  Mellanox,  Michigan,  MREN, NCSA,  Northeastern,  Pacific Northwest Gigapop,  Pacific Wave, the Pacific Research Platform,  SCinet,  SURFnet,  Starlight/iCAIR,  TIFR Mumbai,  Tongi University,  UCLA, USC, UCSD, and Yale.

“The intelligent software-driven end-to-end network systems and high throughput applications we developed and demonstrated at SC18 with our partners will enable the LHC and other leading programs to meet the unprecedented challenges they face in terms of global-scale processing, distribution, and collaborative analysis of massive datasets, and operate with a new level of efficiency and control. Our demonstrations signify the emergence of a new ‘consistent network operations’ paradigm among widely distributed computing and storage facilities hosting exabytes of data,” said Harvey Newman, Caltech’s Goldberger Professor of Physics who leads the team.

Newman explained that in the new concept “stable high throughput flows, at set rates, are load-balanced among network paths in real-time up to flexible high water marks, and are adjusted in real time to accommodate other network traffic. The large smooth flows are launched and managed by SDN services that act in concert with the experiments’ site-resident data distribution and management systems: to make the best use of the available computing and networking resources while meeting the expanding needs of the programs, in order to accelerate the path to scientific discovery.”

An added focus in the SC18 demonstrations was on compact Data Transfer Nodes (DTNs) that support data transfer rates in the 100 gigabit/second (Gbps) to the 1 terabit/second (Tbps) range. One focus of these developments is to clear the way to multi-terabyte and petabyte transactions with the major high-performance computing (HPC) facilities now in operation, as well as those planned by the US Department of Energy (DOE), the National Science Foundation (NSF), and other agencies which are projected to reach the Exaflop level by 2021. Developing the means to bring these facilities into the worldwide LHC “ecosystem” of sites hosting Exabytes of data over the next few years is a pivotal development, both for the LHC and other leading edge science programs.

This development trajectory is enabled by end-to-end SDN methods extending all the way to auto configured DTNs, including intent-based networking APIs combined with transfer applications such as Caltech’s open source TCP based Fast Data Transfer (FDT), which have been shown to match 100G long distance paths at wire speed in production transoceanic wide area networks, along with the orchestration software and controllers and automated virtualized software stacks developed in the SENSE, AmLight-ExP, SFP, OSIRIS and other collaborating projects.  

SC18 Caltech On-Floor and Global Network 

One-quarter rack of servers from 2CSRI and Echostreams with 29 Mellanox and 4 QLogic 100G network interfaces and SSD storage from Intel and other manufacturers provided 3.3 terabits/second of capability in the Caltech booth.  A compact DWDM system from Ciena provided primary connectivity to the booth, supporting multiple optical connections using 200 and 400 Gbps wavelengths. This was complemented by the first full-throughput 400 gigabit/sec Ethernet (400GE) network among Arista switches in SCinet, the SC Conference’s dedicated high-capacity network infrastructure, and the Caltech and USC booths.

Globally, the Caltech booth was connected to 100 gigabit/sec Ethernet (100GE) links over SCinet between the Caltech booth and the Starlight/iCAIR, Michigan, KISTI and SURFnet booths on the exhibit floor, and 100G wide area links connecting via Internet2 and Century Link to the Caltech campus (Pasadena), CENIC in both Los Angeles and Sunnyvale,  FIU/AmLight (Miami) and Starlight (Chicago) with onward links to NCSA, SURFnet (Amsterdam), CERN (Geneva), KISTI and KASI (Korea),  TIFR (Mumbai), the LSST site (Chile) via  ANSP and RNP (Brazil) and REUNA (Chile), and the Pacific Research Platform campuses including UCSD, Stanford and Caltech via CENIC.   

Data flows across the local and wide area networks exceeded 1.2 terabits/second during the “Blow-the-Doors Off the Network” phase of the demonstrations, which took place during the final hours of SC18. Additional information about the SCinet architecture featured at SC18 can be found here: https://doi.org/10.5281/zenodo.1470503

SC18 Demonstrations

The SC18 demonstrations included a wide range of state of the art network system developments and high throughput smart network applications:

  • LSST: Real time low latency transfers for scientific processing of multi-GByte images from the LSST/AURA site in La Serena, Chile, flowing over the REUNA Chilean as well as ANSP and RNP Brazilian national circuits and the Amlight Atlantic and Pacific Ring and Starlight to NCSA; operational and data quality traffic to SLAC, Tucson and other sites; LSST annual multi-petabyte Data Release emulation from NCSA to La Serena at rates consistent with those required for LSST operations.

  • ·    AmLight Express and Protect (AmLight-ExP) supporting the LSST and LHC-related use cases in association with high-throughput low latency experiments, and demonstrations of auto-recovery from network events, used optical spectrum on the new Monet submarine cable, and its 100G ring network that interconnects the research and education communities in the U.S. and South America.

    The AmLight-ExP international network topology consists of two rings.  The AmLight Express spectrum ring is the primary route to bring LSST datasets from Chile to the South Florida AMPATH International Exchange Point.  The AmLight protected SDN 100G ring is already in operation and carries multiple types of research and education data; it will be the backup for LSST.” said AmLight PIs Julio Ibarra (FIU) and Heidi Morgan (USC-ISI).

  • ·    LSST Data Flows to KISTI and KASI

    During the course of the LSST and AmLight SC18 demonstrations, data from the telescope site in Chile arrived via AmLight at both the KISTI and Caltech booths in Dallas, where it was mirrored and carried across SCinet, Starlight, KRLight and KREONet2 to DTNs at KISTI and KASI in Korea, as shown in the figure below. Using Caltech’s Fast Data Transfer (FDT) application, throughputs of 58 Gbps were achieved across the 60 Gbps path from the telescope site to the KISTI booth, and a remarkable 99.7 Gbps on the 100 Gbps path between Dallas and Daejeon.

    SC18_LSST_Network_Diagram

    Figure 1 Data flow paths among the LSST telescope site, the KISTI and Caltech booths,
     and KISTI and KASI in Korea during the SC18 demonstrations

  • ·    SENSE The Software-defined network for End-to-end Networked Science at Exascale (SENSE) project demonstrated smart network services to accelerate scientific discovery in the era of ‘big data’ driven by Exascale, cloud computing, machine learning and AI. The SENSE SC18 demonstration showcased a comprehensive approach to request and provision end-to-end network services across domains that combines deployment of infrastructure across multiple labs/campuses, SC booths and WAN with a focus on usability, performance and resilience through:

    • Intent-based, interactive, real time application interfaces providing intuitive access to intelligent SDN services for Virtual Organization (VO) services and managers;

    • Policy-guided end-to-end orchestration of network resources, coordinated with the science programs' systems, to enable real time orchestration of computing and storage resources;

    • Auto-provisioning of network devices and Data Transfer Nodes (DTNs);

    • Real time network measurement, analytics and feedback to provide the foundation for resilience and coordination between the SENSE intelligent network services, and the science programs' system services;

    • Priority QoS for SENSE enabled flows;

    • Multi-point and point-to-point services;

    • Ability to handle future reservations with defined lifetime for any intent;

    • Automatic provisioned point-to-point and multi-point data transfer tests.

    More information about the SENSE demonstrations may be found at:

    Figure 2 Wide Area Network traffic with QoS for SENSE priority flows     

  • ·    SDN Federated Network Protocol (SFP): Yale, Tongji, IBM, ARL and Caltech demonstrated SFP, the first generic framework supporting fine-grained interdomain routing to address the fundamental mismatch between fine-grained SDN control and the coarse-grained BGP routing in inter-domain networks. Smart filtering and on-demand routing information are used to address the scalability of fine-grained routing, in a collaborative network composed of both exhibitor booths and U.S. campus science networks.

  • ·     Multi-domain Network State Abstraction (Mercator):
    The groups involved in SFP development demonstrated Mercator, a simple, novel, highly efficient multi-domain network resource discovery and orchestration system, to provide fine-grained, global network resource information through both SFP and the Application-Layer Traffic Optimization (ALTO) protocol, to support the high throughput needs of the afore-mentioned collaborative data intensive science programs. This demonstration included:

    • efficient discovery of available network resources with extreme low latency;

    • fairer allocations of networking resources in this collaborative network;

    • preservation of the private information among the member networks;

    • scaling to collaborative networks with hundreds of members.

  • ·     High-Level, Unified SDN Programming (Magellan and Trident): The groups involved in SFP and Mercator further demonstrated a high-level, unified SDN and NFV programming framework consisting of two key components:

    • Magellan, which allows systematic, automatic compilation of high-level SDN programs into low-level SDN datapaths;

    • Trident, which introduces key techniques including live variables, routes algebra, and 3-value logic programming to allow high-level, systematic integration of SDN and NFV, to achieve automatic updates.

    "The state of the art SDN systems and solutions we designed and demonstrated at SC18, including SFP, Mercator, Magellan and Trident, provide a unified, end-to-end, programmable framework the enables the LHC and other premier science experiments to manage and orchestrate huge ensembles of globally distributed resources. At SC18, we used this framework to orchestrate resources among the Caltech booth at SC18 in Dallas and the Caltech SDN testbed in Pasadena, with multiple 100Gbps wide area links connecting the sites, to facilitate large scale complex scientific workflows through our network functions and the latest generation of high throughput data transfer nodes. The SC18 setting precisely reflects the challenges of routing and resource heterogeneity and dynamicity at the LHC and other premier science experiments. The success of our unified framework in such environments is a keystone toward its future deployment at LHC", said Qiao Xiang, an associate research scientist at Yale, and Richard Yang, Professor of Computer Science at Yale. 

    More information about the SFP, Mercator, Magellan and Trident demonstrations may be found at:

  • ·     NDN Assisted by SDN: Northeastern, Colorado State and Caltech demonstrated Named Data Networking (NDN) based workflows, aimed at accelerated caching and analysis in support of the LHC and climate science programs, in association with the SANDIE (SDN Assisted NDN for Data Intensive Experiments) project. The team demonstrated a parallel application integrating NDN into the mainstream federated data access methods used by the Compact Muon Solenoid (CMS) experiment at the LHC, and developed a multifaceted development path towards an implementation that would meet the high throughput needs of the experiment.

    Edmund Yeh, Professor of Computer Science at Northeastern University, said:

    “The SANDIE demonstration at SC 18 was notable for being the first to successfully use the Named Data Networking (NDN) architecture for data distribution over the LHC high energy physics network.   The SANDIE team proved the efficacy of leveraging the new data-centric networking paradigm for efficiently distributing large volumes of scientific data in concert with existing protocols, while maximally exploiting the bandwidth as well as the storage resources of the network.  The team has established an exciting road map toward further development of NDN-based caching, routing, and workflow optimization for LHC as well as other large-scale data-intensive science applications.”

  • ·    High Throughput and multi-GPU Clusters with Kubernetes: UCSD and the NSF-funded Pacific Research Platform (PRP) led a group of collaborators demonstrating Kubernetes-based virtualized cluster management and the design and applications of high throughput PRP FIONA servers. The latest multi-GPU FIONA designs, being deployed at UCSD and other sites participating in the NSF-funded Cognitive Hardware And Software Ecosystem Community Infrastructure (CHASE-CI) project, are building a cloud of hundreds of affordable gaming Graphics Processing Units (GPUs).  These can then be networked together with a variety of neural network machines to facilitate development of next generation cognitive computing.

  • ·    400GE First Data Network: USC together with Caltech, Starlight/NRL, Arista Networks, SCinet/XNET, Mellanox and 2CRSI are demonstrating the first fully functional 400GE network, as illustrated in the Big Data SCinet diagram of the Caltech, USC and Starlight booths and their wide area connections below. Flows between Supermicro servers equipped with Mellanox ConnectX-5 VPI network interface cards in the USC and Caltech booths achieved sustained flows close to 800 Gbps with very low CPU overhead   in the first round of tests on beginning just prior to SC18 as shown in the figure below, and continuing to the end of the Network Research Exhibition. 

     

Quotes:

Jason Zurawski, SCinet chair and science engagement engineer at the Energy Sciences

Network (ESnet), said:
“Support for experimentation is one of the primary design factors of SCinet each year. SC18 was particularly successful in balancing the requirements of exhibitors that participate in our Network Research Exhibition (NRE) program as well as our own Experimental Networks (XNet) group. The demonstrations carried out by members of the science, network research and high-performance computing communities at SC18, including the SDN-driven Next Generation Integrated Architecture effort by Caltech and its partners, are significant because of their contributions to improving scientific outcomes worldwide."

Caltech Network Engineer Shaswitha Puttawaswamy said:
“These groundbreaking technical achievements and a new networking paradigm that we have been working towards with our partners for the last few years will enhance the capabilities of the LHC and other major science programs”

Team member Brij Jashal of the Tata Institute of Fundamental Research (TIFR) in Mumbai
said: "This year's SC18 worldwide demonstrations by Caltech and partners, which included significant participation of TIFR on the Indian subcontinent, were an important step forward in the development and deployment of advanced networks in the SENSE, SANDIE and other projects, and for global collaboration in the coming era of exascale high latency science workflows."

 “The thirst for more bandwidth and capacity to drive scientific discovery is unquenchable,” said Rodney Wilson, Chief Technologist, Research Networks at Ciena. “It’s a battle of tiny increments and synergies that build new concepts and systems from many elements. When you add them all up, it creates a world class solution,” said Wilson. “Working with Caltech’s High Energy Physics and Network Teams, Professor Newman, his students and an extensive array of collaborators gives us a competitive edge and opens our minds to new approaches and ways to solve next generation networking problems.”

Phil Reese, Research Computing Strategist at Stanford, said

“Having been a part of the 100G testing at SC in earlier years, I watched with interest the development and planning for the even more cutting edge 400G SC activity. The hard learned lessons at 100G paid off, as the 400G deployments were smoothly and expertly executed.  Hardware, software and the human element were all in sync; if only FedEx worked that well.”

Group Leads and Participants, by Team

        Laboratory, Network & Technology Industry Partners

          Caltech                                            www.caltech.edu

          ESnet                                              www.es.net

          CENIC                                                  www.cenic.org

            Pacific Wave                                            www.pacificwave.net

AmLight                                             www.amlight.net

SCinet                                             sc18.supercomputing.org/experience/scinet/

Ciena                                              www.ciena.com

USC                                               www.usc.edu

          Starlight                                          www.startap.net/starlight/

iCAIR/Northwestern                         www.icair.org

MREN                                            www.mren.org

Internet2                                               www.internet2.edu

Northeastern                                                www.northeastern.edu    

         Yale                                                www.yale.edu

Colorado State                                 www.colostate.edu

Fermilab                                         www.fnal.gov
          CERN                                                   www.cern.ch

TIFR                                                www.tifr.res.in

UCLA                                             www.ucla.edu
          KISTI                                             http://www.kisti.re.kr/eng/

         Lawrence Berkeley Nat’l Lab             http://www.lbl.gov   

Michigan                                                      https://umich.edu/ 

          RNP                                                www.rnp.br

         ANSP                                                 www.ansp.br

UNESP                                           www.unesp.br/international/

           REUNA                                               www.reuna.cl/

           SURFnet                                                      https://www.surf.nl/en/homepage

CenturyLink                                    www.centurylink.com

2CRSI                                             www.2crsi.com

Arista Networks                                         www.arista.com

Cisco                                              www.cisco.com

Echostreams                                     www.echostreams.com

Intel                                                   www.intel.com

Mellanox                                            www.mellanox.com

Dell                                                www.dell.com

Pica8                                              www.pica8.com   

For more information, press only:

Caltech Media Relations
Kathy Svitil, Director of Research Communications

(626) 395-8022
ksvitil@caltech.edu

For more information on

Global Petascale to Exascale Science Workflows Accelerated by Next Generation Software Defined Network Architectures and Applications

Team Lead and Contact: Harvey Newman, (626)-395-6656, newman@hep.caltech.edu                     

                  Science with a Mission