Cyberscience Center was founded as a national supercomputer
center hosted by Tohoku University in 1969. Since then, the Center has been installing high-end
computing systems and providing them to researchers and students nationwide in Japan. In addition to
the role of a national inter-university joint usage/research center for high performance computing,
the Center is responsible for cyber-infrastructure of the University.
This year we have installed a new system, AOBA.
AOBA System Information
The world's first system of the second-generation SX-Aurora
TSUBASA
We focus on the design and development of high-performance supercomputing systems and their
applications. Since it is necessary to well understand and exploit the parallelism of the underlying
hardware for high
performance, we are
developing programming models and supportive tools to facilitate the parallel programming and code
optimization. In
addition, as supercomputer systems are becoming larger and more complicated, its power consumption
and dependability
become critical design constraints of supercomputers. Therefore, we are designing/architecting the
next generation
supercomputers with high power efficiency and dependability.
Application Development
Working together with application
developers to achieve high performance.
Machine Learning for HPC
Performance-aware programming assisted
by machine learning.
HPC Programming Framework
Application development is a teamwork
of different kinds of programmers.
QA for Combinatorial Problems
Making a good use of emerging
devices for
HPC.
Heterogeneous Computing
Programming with assigning right tasks
to right
processors.
Task Mapping and Scheduling
Intelligent resource management for
highly efficient computing.
Tsunami Inundation Simulation
Predicting damage from a tsunami immediately after the earthquake.
Memory Architecture Design
Integrating large and fast memories into a single system.
Code Optimization and Tuning
Making application codes suitable for a specific architecture.
Presentations at SC20
Professor Hiroyuki Takizawa
Operation Start of the Second-Generation SX-Aurora TSUBASA (NEC Aurora Forum)
Professor Ryusuke Egawa
Exploiting the Potentials of the Second Generation SX-Aurora TSUBASA (PMBS20)
Doctor Mulya Agung
A Parallel Job Scheduling Method to Effectively Use Shared Heterogeneous Systems for Urgent Computations (UrgentHPC)
Professor Akihiro Musa
Evaluation of Tsunami Inundation Simulation using Vector-Scalar Hybrid MPI on SX-Aurora TSUBASA (SC20 Research Poster)
Student Presentations at Booth
Yuta Sasaki
Quantum Compiler: Automatic Vectotization Assisted by Quantum Annealer
Naoki Ebata
Automatically Avoiding Memory Access Conflicts on SX-Aurora TSUBASA
Suhang Jiang
Task Priority Control for the HPX Runtime System
Michael Ryan Zielewski
Challenges in solving scheduling problems with the D-Wave quantum annealer.
Akito Onodera
Optimizations for the Himeno Benchmark on Vector Computing System SX-Aurora TSUBASA
Masahito Kumagai
An Evaluation of a Hierarchical Clustering Method Using Quantum Annealing
Shunsuke Tsukada
Metadata Management for Large-Scale Hybrid Memory Architectures
Contact Us
We ♥ New Friends
System Configuration
Supercomputer AOBA consists of two subsystems, AOBA-A
and AOBA-B.
AOBA-A is a 72-node system of NEC SX-Aurora TSUBASA, while AOBA-B is a 68-node
system of NEC LX 406Rz-2. Both of the two subsystems use AMD EYPC 7702 processors.
Besides,
each node of AOBA-A is equipped with 8 Vector Engines Type 20B.
Total performance: 1.78 Pflop/s
Total memory bandwidth: 924 TB/s
Application Development
We are working together with application developers to
achieve high performance.
We have a lot of joint-research projects with application developers in various
research areas. As a major supercomputing center, the Cyberscience Center
has made
outstanding contributions to various research fields,
such as Airplane and Turbine simulations for engineering,
Tsunami inundation and flood simulations for disaster prevention and reduction, and
heat stroke risk simulation for public health.
Code Optimization by Machine Learning
Can ML replace a "Superprogrammer"?
Conventionally, high performance computing (HPC) applications
are optimized for their target computing systems based
on experiences of programmers. The program optimization
often needs to select an appropriate implementation from
multiple candidates, and the implementation selection has
usually been made in a try and error fashion. Thus, there
is a demand for automating the implementation selection.
Recently, it is extensively reported that machine learning
models can successfully replace various tasks that have empirically
been done by experts. Therefore, machine learning
becomes a promising approach to the automation of program
optimization.
Code Transformation for Performance Tuning
A key to success is effective collabotion between
application developers and performance engineers.
System‐aware code optimizations often make it difficult for programmers to maintain
HPC application codes. On the other
side, system‐aware code optimizations are mandatory to exploit the performance of
target HPC systems. To achieve both
high maintainability and high performance, we develop the Xevolver framework that
provides an easy
way to express system‐aware code
optimizations as user‐defined code transformation rules. Those rules can be defined
separately from HPC application
codes. As a result, an HPC application code is converted into its optimized version
for a particular target system just
before the compilation, and standard HPC programmers do not usually need to maintain
the optimized version that could be
complicated and difficult‐to‐maintain.
HPC and Quantum Computing
How can the emerging devices help HPC? How can HPC help
the emerging devices?
The effects that problem formulation and embedding have on the performance of
a quantum
annealer (QA) are not well
studied. Thus we investigate these effects, determine methods through
which performance can be
maximized, and investigate the
interplay between them. We select scheduling problems for our evaluation.
Scheduling is one of the most ubiquitous
types of problems in optimization and has applications in many fields. The standard
form of these problems is finding an
assignment of tasks to resources that satisfies problem constraints; however,
real world applications require domain
specific variants. In this work, we select the standard n×m job-shop
scheduling problem (JSP) to evaluate
the performance of QA.
Heterogeneous Computing
Effective collaboration of different kinds of
processors
Many supercomputers in recent years have accelerators specialized for
scientific computation in addition to
the conventional general-purpose processors (CPUs). Representative accelerators are
Graphics Processing Units (GPUs) and Vector Engines (VEs). By using accelerators,
many applications can
obtain significant
performance improvement compared with just
using CPUs. Besides, since accelerators can achieve high energy efficiency,
accelerator computing will be
widespread and generalized even more in the future.
Task Mapping and Scheduling
Intelligent resource management is a key to achieve
high performance.
The mapping of tasks to processor cores, called task mapping, is crucial to achieving
scalable performance on multicore
processors. On modern NUMA (non-uniform memory access) systems, the memory
congestion problem could degrade the
performance more severely than the data locality problem because heavy congestion on
shared caches and memory
controllers could cause long latencies. Conventional work on task mapping mostly
focuses on improving the locality of
memory accesses. However, we showed that on modern NUMA systems,
maximizing the locality can degrade the
performance due to memory congestion. Therefore, we propose a task mapping method
that addresses the locality and the
memory congestion problems to improve the performance of parallel applications.
Tsunami Inudation Simulation
Predicting damage from a tsunami immediately after the earthquake
TWe established a new method of real-time tsunami inundation forecasting, damage estimation and mapping with use of advanced sensor networks and modern computing power. The method consists of fusion of real-time crustal deformation monitoring/fault model estimation, high-performance real-time tsunami propagation/inundation simulation with a vector supercomputer, and tsunami fragility curves for damage/loss estimation. The method has recently accomplished “10-10-10 challenge”, to complete tsunami source determination in 10 minutes, tsunami inundation modeling in 10 minutes with 10 m grid resolution.
Memory Architecture Design
Integrating large and fast memories into a single system
A modern computer needs a memory system with a high performance and a large capacity. The goal of this research is to achieve a Near/Far memory system that consists of memory modules with different performances and capacities. Moreover, the system appropriately organizes the application data among these modules by data access characteristics.
Code Optimization and Tuning
Making application codes suitable for a specific architecture.
Microprocessors increase diversity due to the trends in manufacturing technologies and applications. This research evaluates various latest microprocessors and, based on the evaluation results, adopts the applications to the microprocessors by optimizing and tuning the application codes to obtain the high performances.