ESSPER: Experimental FPGA Cluster connected with Supercomputer Fugaku

Kentaro Sano (Team Leader, Processor Research Team, Center for Computational Science, RIKEN)

Abstract

At RIKEN Center for Computational Science (R-CCS), we have been developing "ESSPER (Elastic and Scalable System for high-PErformance Reconfigurable computing)," which is a prototype FPGA cluster system targeting reconfigurable HPC. The system is composed of sixteen Intel Stratix 10 SX FPGAs which are connected by a dedicated 100Gbps inter-FPGA network. We have developed our own Shell (SoC) and its software APIs for the FPGAs supporting inter-FPGA communication. The FPGA host servers are connected to a 100Gbps Infiniband switch, which allows distant servers to remotely access the FPGAs by using a software bridged Intel's OPAE FPGA driver, called R-OPAE. By 100Gbps Infiniband network and R-OPAE, ESSPER is actually connected to the world's fastest supercomputer, Fugaku, deployed in RIKEN. In tasks running on Fugaku nodes, we can program bitstreams onto FPGAs remotely using R-OPAE, and off-load tasks to the FPGAs with application cores embedded in the FPGA shell. In this talk, I introduce our achievements, challenges, and future prospects of reconfigurable HPC with FPGAs, especially, from a system point of view.

<= Go back