SuperComputing 2018

Global Petascale to Exascale Science Workflows Accelerated by Next Generation Software Defined Network Architectures and Applications

SC18 Demonstrations


LSST
AmLight Express and Protect (AmLight-ExP)
Data Flows to KISTI and KASI
SENSE
SDN Federated Network Protocol (SFP)
Multi-domain Network State Abstraction (Mercator)
High-Level, Unified SDN Programming (Magellan and Trident)
NDN Assisted by SDN
High Throughput and multi-GPU Clusters with Kubernetes
400GE First Data Network

The SC18 demonstrations included a wide range of state of the art network system developments and high throughput smart network applications:

  • LSST: Real time low latency transfers for scientific processing of multi-GByte images from the LSST/AURA site in La Serena, Chile, flowing over the REUNA Chilean as well as ANSP and RNP Brazilian national circuits and the Amlight Atlantic and Pacific Ring and Starlight to NCSA; operational and data quality traffic to SLAC, Tucson and other sites; LSST annual multi-petabyte Data Release emulation from NCSA to La Serena at rates consistent with those required for LSST operations.
  • AmLight Express and Protect (AmLight-ExP) supporting the LSST and LHC-related use cases in association with high-throughput low latency experiments, and demonstrations of auto-recovery from network events, used optical spectrum on the new Monet submarine cable, and its 100G ring network that interconnects the research and education communities in the U.S. and South America.

“The AmLight-ExP international network topology consists of two rings.  The AmLight Express spectrum ring is the primary route to bring LSST datasets from Chile to the South Florida AMPATH International Exchange Point.  The AmLight protected SDN 100G ring is already in operation and carries multiple types of research and education data; it will be the backup for LSST.” said AmLight PIs Julio Ibarra (FIU) and Heidi Morgan (USC-ISI).

 

  • LSST Data Flows to KISTI and KASI

During the course of the LSST and AmLight SC18 demonstrations, data from the telescope site in Chile arrived via AmLight at both the KISTI and Caltech booths in Dallas, where it was mirrored and carried across SCinet, Starlight, KRLight and KREONet2 to DTNs at KISTI and KASI in Korea, as shown in the figure below. Using Caltech’s Fast Data Transfer (FDT) application, throughputs of 58 Gbps were achieved across the 60 Gbps path from the telescope site to the KISTI booth, and a remarkable 99.7 Gbps on the 100 Gbps path between Dallas and Daejeon.

 

Figure 1 Data flow paths among the LSST telescope site, the KISTI and Caltech booths,
 and KISTI and KASI in Korea during the SC18 demonstrations


LSST Data Flows to KISTI and KASI

 

  • SENSEThe Software-defined network for End-to-end Networked Science at Exascale (SENSE) project demonstrated smart network services to accelerate scientific discovery in the era of ‘big data’ driven by Exascale, cloud computing, machine learning and AI. The SENSE SC18 demonstration showcased a comprehensive approach to request and provision end-to-end network services across domains that combines deployment of infrastructure across multiple labs/campuses, SC booths and WAN with a focus on usability, performance and resilience through:
    • Intent-based, interactive, real time application interfaces providing intuitive access to intelligent SDN services for Virtual Organization (VO) services and managers;
    • Policy-guided end-to-end orchestration of network resources, coordinated with the science programs' systems, to enable real time orchestration of computing and storage resources;
    • Auto-provisioning of network devices and Data Transfer Nodes (DTNs);
    • Real time network measurement, analytics and feedback to provide the foundation for resilience and coordination between the SENSE intelligent network services, and the science programs' system services;
    • Priority QoS for SENSE enabled flows;
    • Multi-point and point-to-point services;
    • Ability to handle future reservations with defined lifetime for any intent;
    • Automatic provisioned point-to-point and multi-point data transfer tests.

More information about the SENSE demonstrations may be found at:

Figure 2 Wide Area Network traffic with QoS for SENSE priority flows     

SENSE

  • SDN Federated Network Protocol (SFP): Yale, Tongji, IBM, ARL and Caltech demonstrated SFP, the first generic framework supporting fine-grained interdomain routing to address the fundamental mismatch between fine-grained SDN control and the coarse-grained BGP routing in inter-domain networks. Smart filtering and on-demand routing information are used to address the scalability of fine-grained routing, in a collaborative network composed of both exhibitor booths and U.S. campus science networks.
  • Multi-domain Network State Abstraction (Mercator):
    The groups involved in SFP development demonstrated Mercator, a simple, novel, highly efficient multi-domain network resource discovery and orchestration system, to provide fine-grained, global network resource information through both SFP and the Application-Layer Traffic Optimization (ALTO) protocol, to support the high throughput needs of the afore-mentioned collaborative data intensive science programs. This demonstration included:
  • High-Level, Unified SDN Programming (Magellan and Trident)>: The groups involved in SFP and Mercator further demonstrated a high-level, unified SDN and NFV programming framework consisting of two key components:
  • Magellan, which allows systematic, automatic compilation of high-level SDN programs into low-level SDN datapaths;
  • Trident, which introduces key techniques including live variables, routes algebra, and 3-value logic programming to allow high-level, systematic integration of SDN and NFV, to achieve automatic updates.

 

"The state of the art SDN systems and solutions we designed and demonstrated at SC18, including SFP, Mercator, Magellan and Trident, provide a unified, end-to-end, programmable framework the enables the LHC and other premier science experiments to manage and orchestrate huge ensembles of globally distributed resources. At SC18, we used this framework to orchestrate resources among the Caltech booth at SC18 in Dallas and the Caltech SDN testbed in Pasadena, with multiple 100Gbps wide area links connecting the sites, to facilitate large scale complex scientific workflows through our network functions and the latest generation of high throughput data transfer nodes. The SC18 setting precisely reflects the challenges of routing and resource heterogeneity and dynamicity at the LHC and other premier science experiments. The success of our unified framework in such environments is a keystone toward its future deployment at LHC", said Qiao Xiang, an associate research scientist at Yale, and Richard Yang, Professor of Computer Science at Yale. 

 

More information about the SFP, Mercator, Magellan and Trident demonstrations may be found at:

 

  • NDN Assisted by SDN: Northeastern, Colorado State and Caltech demonstrated Named Data Networking (NDN) based workflows, aimed at accelerated caching and analysis in support of the LHC and climate science programs, in association with the SANDIE (SDN Assisted NDN for Data Intensive Experiments) project. The team demonstrated a parallel application integrating NDN into the mainstream federated data access methods used by the Compact Muon Solenoid (CMS) experiment at the LHC, and developed a multifaceted development path towards an implementation that would meet the high throughput needs of the experiment.

Edmund Yeh, Professor of Computer Science at Northeastern University, said:

“The SANDIE demonstration at SC 18 was notable for being the first to successfully use the Named Data Networking (NDN) architecture for data distribution over the LHC high energy physics network.   The SANDIE team proved the efficacy of leveraging the new data-centric networking paradigm for efficiently distributing large volumes of scientific data in concert with existing protocols, while maximally exploiting the bandwidth as well as the storage resources of the network.  The team has established an exciting road map toward further development of NDN-based caching, routing, and workflow optimization for LHC as well as other large-scale data-intensive science applications.”

  • High Throughput and multi-GPU Clusters with Kubernetes: UCSD and the NSF-funded Pacific Research Platform (PRP) led a group of collaborators demonstrating Kubernetes-based virtualized cluster management and the design and applications of high throughput PRP FIONA servers. The latest multi-GPU FIONA designs, being deployed at UCSD and other sites participating in the NSF-funded Cognitive Hardware And Software Ecosystem Community Infrastructure (CHASE-CI) project, are building a cloud of hundreds of affordable gaming Graphics Processing Units (GPUs). These can then be networked together with a variety of neural network machines to facilitate development of next generation cognitive computing.
  • 400GE First Data Network: USC together with Caltech, Starlight/NRL, Arista Networks, SCinet/XNET, Mellanox and 2CRSI are demonstrating the first fully functional 400GE network, as illustrated in the Big Data SCinet diagram of the Caltech, USC and Starlight booths and their wide area connections below. Flows between Supermicro servers equipped with Mellanox ConnectX-5 VPI network interface cards in the USC and Caltech booths achieved sustained flows close to 800 Gbps with very low CPU overhead   in the first round of tests on beginning just prior to SC18 as shown in the figure below, and continuing to the end of the Network Research Exhibition.

    400GE First Data Network