23rd WSSP
Technical Program
Workshop Day 1 (Wed., March 16th, 2016)
Time Presentation
Registration
Opening Session
10:00-10:05 Opening Remarks
Hiroaki Kobayashi (Tohoku Univ.)
10:05-10:15 Greetings
Takafumi Aoki (Tohoku Univ.)
10:15-10:30 HPC Policy in Japan
Katsuyuki Kudo (MEXT)
Keynote Talk I
10:30-11:10 Parallel Algorithms: Theory, Practice and Education
Vladimir Voevodin (Moscow State Univ.)
Abstract:

The computing world is changing and all devices from mobile phones and personal computers to high-performance supercomputers are becoming parallel. At the same time, the efficient usage of all the opportunities offered by modern computing systems represents a global challenge. Using full potential of parallel computing systems and distributed computing resources requires new knowledge, skills and abilities, where one of the main roles belongs to understanding of key properties of parallel algorithms. What are these properties? What should be discovered and expressed explicitly in existing algorithms when a new parallel architecture appears? How to ensure efficient implementation of an algorithm on a particular parallel computing platform? All these as well as many other issues will be addressed in the talk.

The idea that we use in educational practice at the university is to split a description of an algorithm into two parts. This helps us to explain what a good parallel algorithm is and what is important for its efficient implementation. The first part describes algorithms and their properties. The second part is dedicated to describing particular aspects of their implementation on various computing platforms. The first part draws attention to the key theoretical properties, and the second part puts emphasis on the aspects fundamentally important on practice. This division is made intentionally to highlight the machine-independent properties of algorithms which determine their potential and the quality of their implementations on parallel computing systems, and to describe them separately from a number of issues related to the subsequent stages of programming and executing the resulting programs. In addition to the classical algorithm properties such as serial complexity, we have to explain concepts such as parallel complexity, parallel structure, determinacy, data locality, performance and scalability estimates, communication profiles for specific implementations, and many others aspects.

This approach was successfully implemented as an open encyclopedia AlgoWiki, which is available for the computational community at www.AlgoWiki-Project.org.

Session 1 "Organizers' Talks I"
11:10-11:40 HPC strategies in 2025
Michael Resch (HLRS, Univ. Stuttgart)
Abstract:

As we reach the end of Moore's law we have to work out strategies to handle a situation in which speed of systems does not keep growing as it did over the last years. The technical options are still there but it is getting increasingly difficult to turn them into solutions. The role of software and of algorithms is increasing and it remains to be seen how far we can get with this situation.

11:40-12:10 Highly-Productive Computing on Modern and Future Vector Platforms
Hiroaki Kobayashi (Tohoku Univ.)
12:10-13:10 Lunch
Session 2 "System Software"
13:10-13:40 HPC and HPDA - What is missing
Bastian Koller (HLRS)
Abstract:

This talk will give an overview on HPC and HPDA, their complementarity and obstacles to be overcome to use them as a proper tool for academia and industry, taking also into account different levels of expertise of the users of such systems.

13:40-14:10 Performance Engineering of HPC Applications Based on Pattern Matching
Hiroyuki Takizawa (Tohoku Univ.)
Abstract:

The Xevolver framework provides user-defined code transformations for special demand of individual systems and applications. A code transformation rule is defined so that a code pattern is transformed into another pattern. Thus, the rule is applied based on pattern matching. In Xevolver, such a pattern can be expressed at various abstraction levels. At the lowest level, the pattern to be transformed is expressed as a pattern of AST nodes. At another level, the pattern can be specified by some domain knowledge if it is one of common code patterns that frequently appear in an application domain such as stencil computations. Using Xevtgen, moreover, the pattern can be expressed as a pair of before-and-after versions of a code in the original programming language such as C and Fortran. This talk shows the importance of such hierarchical abstraction for user-defined code transformations.

Session 3 "Applications I"
14:10-14:40 Spectral structures for nonlinear operators on arbitrary compact spaces
Uwe Kuester (HLRS)
14:40-15:10 Computation of Temperature Elevation in the Human Body for Ambient Heat Using Vector Supercomputer SX-ACE
Akimasa Hirata (Nagoya Inst. of Tech.)
Abstract:

To evaluate the adverse health effect of heat load on the human body, it is essential to analyze the time evolution of sweating and core temperature elevation. The feature of the computational code developed at the Nagoya Institute of Technology is to combine the thermodyanamic analysis and thermoregulatory response. In this talk, we implement our computational code to the vector supercomputer SX-ACE and preliminary performance is presented. Then, a perspective of the application to heat stroke protection is also presented.

15:10-15:30 Coffee Break
Session 4 "CFD"
15:30-16:00 Simulation of Turbulent Flows including Multiphysics
Matthias Meinke (AIA RWTH Aachen)
16:00-16:30 HPC Applications for Manufacturing Innovation in Aerospace Fields
Ryoji Takaki (JAXA)
Abstract:

JAXA promotes research and development of High Performance Computing technology in order to help aerospace developments. From this April, JAXA will start full operation of JAXA Supercomputer System 2:JSS2. A main engine of JSS2 is called SORA-MA, which is FUJITSU Supercomputer PRIMEHPC FX100. It is a many-core based scalable parallel cluster system. This paper reports brief overview and preliminary performance evaluation of SORA-MA. Also, a numerical simulation example, applied to a JAXA’s project is presented.

16:30-17:00 Towards Aerodynamic Characteristics Investigation Based On Cartesian Methods for Low-Reynolds Number Flow Simulation
Daisuke Sasaki (Kanazawa Inst. of Tech.)
Abstract:

Micro Aerial Vehicles (MAVs) are recently focused for various usage such as monitoring and recording. One of the issues of MAVs is the limitation of operation time. An efficient configuration is required for MAVs, however, complex low-Reynolds number flows causes the difficulty to design it. In this research, Cartesian-based CFD approach is applied to various airfoils at low-Reynolds number flows to investigate the aerodynamic characteristics.

Workshop Day 2 (Thu., March 17th, 2016)
Time Presentation
Registration
Keynote Talk II
10:00-10:40 Simulation of Time Sequential Deformation Behaviors of Diseased Blood vessel Wall under Pulsatile Pressure Conditions based on CFD
Toshimitsu Yokobori (Tohoku Univ.)
Abstract:

Numerical analyses were conducted to detect the time sequential deformation behaviors of diseased blood vessel wall with aneurysm under pulsatile pressure conditions using designed original program software based on computer fluid dynamics. To detect detailed these behaviors, it is necessary to conduct an analysis with high accuracy and long CPU time. We designed original algorithm to detect detailed behaviors of diseased blood vessel wall with aneurysm, however it takes much CPU time. Using super computer, CPU time of the numerical analysis was also successfully shortened and it will enable us to apply this analysis to actual blood vessel structure.

10:40-10:50 Break
Session 5 "Applications II"
10:50-11:20 Eulerian-Lagrangian Approach For Reactive Flows Consisting of Polidisperse Particles
Akiko Matsuo (Keio Univ.)
Abstract:

In the paper, Eulerian-Lagrangian approach are utilized for the reactive flows consisting of polidisperse particles. We solve the fluid phase conservation equations on a Eulerian grid, and the particle motion is governed by the Lagrangian approach using Parcel model. Particle-in-Cell method is applied to model the reactive flow of energetic materials, such as coal-dust and granular propellant. The behavior of particles and the interaction between solid and gas phases are simulated, and the phenomena of coal-dust explosions and interior ballistics are well reproduced.

11:20-11:50 Performance Study on Two-Path Aliasing-Free Calculation of a Spectral DNS Code
Mitsuo Yokokawa (Kobe Univ.)
Abstract:

A parallel direct numerical simulation (DNS) code for analyzing isotropic, homogeneous turbulent fluid flow was developed on the K computer. In the code, phase shift algorithm is used to remove aliasing errors in the convolution of nonlinear terms. Since the phase shift algorithm has two convolution paths, calculation time reduction is expected if the two paths are carried out in two separate MPI communicators. In this talk, performance of such a modified code will be given.

11:50-12:20 A direct solver of sparse matrix for fluid problems on a multicore supercomputer
Atsushi Suzuki (Osaka Univ., Cybermedia Center)
Abstract:

Thanks to recent development of graph partitioning software, it becomes possible to implement efficient direct solver for large sparse matrices on multicore parallel architecture. Advantages of direct solver compared to iterative solver are stability for large condition number and capability to deal with singular matrix. Dissection sparse solver, we have developed DOI:10.1002/nme.4729, can detect the kernel of the matrix and now it is implemented on a vector architecture with use of sequential BLAS library and asynchronous parallel execution. We will show performance of unsymmetric solver for the matrix obtained from incompressible flow problem discretized by a finite element method, where the matrix is singular with pressure ambiguity.

12:20-13:30 Lunch
Session 6 "Organizers' Talks II"
13:30-14:00 The Prototype of JAMSTEC "Cyber System" for Geo- and Oceanographic Information
Kenichi Itakura (JAMSTEC)
Abstract:

The JAMSTEC "Cyber System" for Geo- and Oceanographic Information connects between the super computer system and the observation equipments. This total system creates and analyses new big data and the real time access to such a big data from the fields changes the observation technique. We tried the first case to use JAMSTEC "Cyber System" for the observation in the eastern Indian Ocean Sumatra coast on November 2015. This user example is introduced in my talk.

14:00-14:30 NEC Vector Supercomputer and Its Application
Shintaro Momose (NEC/Tohoku Univ.), Takuya Araki
Abstract:

The presentation will focus on an overview of the latest model vector supercomputer SX-ACE with its architectural highlights and applications. SX-ACE features a world top-level single-core performance and a memory bandwidth, enabling high sustained performance, as indicated by the highest efficiency on the HPCG benchmark program aimed at a more relevant metric for ranking HPC systems. The talk will also outline the concept of NEC's future vector computing system for various scientific and social issues. Furthermore, we will introduce a middleware system tailored to emerging big data analytics. It can easily harness high-performance capabilities of vector supercomputers without being aware of MPI particularly for the machine learning algorithm in which matrix operations are heavily used.

Session 7 "SX-ACE"
14:30-15:00 APES on SX-ACE
Harald Klimach (Univ. of Siegen)
15:00-15:30 A Case Study of Memory Optimization on SX-ACE
Raghunandan Mathur (NEC Technologies India)
Abstract:

The trend of recent hardware designs shows reduced per-node memory capacity with higher computational performance. This case study discusses memory optimizations which can reduce the memory requirement of a program and enable its execution with higher performance on recent HPC systems, such as NEC's SX-ACE.

15:30-15:50 Coffee Break
Session 8 "Young Researchers"
15:50-16:20 A Correctness Verification Framework for Empirically Tuning Large-scale HPC Applications
Shoichi Hirasawa (Tohoku Univ.)
Abstract:

Many practical numerical HPC applications have been developed and maintained for a long period to accomplish their science and engineering purposes. The code size of such a practical application tends to grow larger because many algorithms are implemented and optimized for its target computing platforms during its long life cycle. As practical applications and computing platforms have become complex, static compiler optimizations alone do not necessarily extract the potential of the computing platforms for the applications. Assuming that a lot of variants of the application are provided, empirical tuning is able to achieve high performance on a complex computing platform by selecting an appropriate variant among the provided variants based on dynamic information such as performance profiling data. One problem is that some of the variants may change the behavior of the application and lead to a computation result that is considered to be wrong by the application user. Therefore, the application user needs to verify the computation result of every variant, while a lot of variants still produce exactly the same computation result and essentially do not need to be verified. In many cases, the correctness of a computation result can only be verified after the application has been completed. Accordingly, executing the application many times to the end for verification work wastes time and thus reduces opportunities to find more appropriate variants in a given tuning time. In this talk, a correctness verification framework is discussed that virtually verifies the computation result of every variant while skipping redundant verification work with memoization to reduce overall tuning time for large-scale HPC applications.

16:20-16:50 Semiconductor simulation based on a parallel domain decomposition method
Shohiro Sho (Osaka Univ., Cybermedia Center)
Abstract:

We present semiconductor device simulations based on a parallel domain decomposition method. A hybrid MPI/OpenMP parallelization of the stationary drift-diffusion model is realized on a parallel computer SX-ACE having a multi-core architecture. The performance results of a three-dimensional bulk n-MOSFET are evaluated.

16:50-17:20 Efficient coupling of acoustic and flow on massively parallel systems
Verena Krupp (Univ. of Siegen)
17:20-17:30 Concluding Remarks