High-Performance Scientific Applications Using Mixed Precisions and Low-Rank Approximations Powered by Task-based Runtime Systems

Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and reduce their execution time. Sometimes, even forming the dense matrix can be a bottleneck – in computation or storage.

Overview

Abstract

To leverage the extreme parallelism of emerging architectures, so that scientific applications can fulfill their high fidelity and multi-physics potential while sustaining high efficiency relative to the limiting resource, numerical algorithms must be substantially redesigned. The algorithmic redesign is capable of shifting the limiting resource, for example from memory or communication to arithmetic capacity. The benefit of algorithmic redesign expands greatly when introducing a tunable tradeoff between accuracy and resources. Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and reduce their execution time. Sometimes, even forming the dense matrix can be a bottleneck – in computation or storage.
Two “universes” of algorithmic innovations have emerged to improve computations by orders of magnitude in capacity and runtime. Each introduces a hierarchy, of rank or precision. Tile Low-Rank (TLR) approximation replaces blocks of the dense operator with those of low rank. Mixed precision approximation, increasingly well supported by contemporary hardware, replaces blocks of high with low precision. Herein, we design new high-performance direct solvers based on the synergism of TLR and mixed precision. Since adapting to data sparsity leads to heterogeneous workloads, we rely on task-based runtime systems to orchestrate the scheduling of fine-grained kernels onto computational resources. We highlight the importance of abstracting the hardware complexity and balancing the load. We first demonstrate how TLR permits accelerating acoustic scattering and mesh deformation simulations. Our solvers outperform the state-of-art libraries by up to an order of magnitude. Then, we demonstrate the impact of enabling mixed precision in a bioinformatics context. Mixed precision enhances the performance up to three-fold speedup. Finally, we explore the combination of TLR and mixed-precision computations and showcase the sustained performance of the aforementioned applications.
To facilitate the adoption of task-based runtime systems, we introduce the AL4SAN library to provide a common API for the expression and queueing of tasks across multiple dynamic runtime systems. This library handles a variety of workloads at a low overhead while increasing user productivity. AL4SAN enables interoperability by switching runtimes at runtime, which permits achieving a twofold speedup on a task-based generalized symmetric eigenvalue solver.
 

Brief Biography

Rabab Alomairy is a computer science Ph.D. candidate working in the Extreme Computing Research Center at the King Abdullah University of Science and Technology (KAUST). Rabab earned her BSc degree in computer science from King Abdulaziz University in 2010 and an MSc degree from KAUST in 2013. Her research is centered around task-based numerical libraries and applications, performance optimizations for multicore/manycore architectures and hardware accelerators, dynamic runtime systems, GPU programming, and machine learning and artificial intelligence. She was recently selected as one of the Rising Stars in Computational and Data Sciences by Sandia National laboratories. During her Ph.D. study, she worked with many collaborators from International laboratories such as the Innovative Computing Lab at the University of Tennessee and  Material Forming at MINES ParisTech University. In Summer 2021, she joined the Innovative Computing Lab at the University of Tennessee as visiting scholar. There she worked towards one of the Software for Linear Algebra Targeting Exascale (SLATE) project milestones which is a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration (NNSA).  She received the prestigious Gauss Award for Supercomputing of Excellence from the ISC High-Performance Computing Conference and the KAUST Research Excellence Award for academic year 21/20. In the KAUST-NVIDIA workshop, she achieved 1st place in the Artificial Intelligence competition based on the geospatial dataset. She received two poster awards one at Artificial Intelligence in Medicine conference at KAUST and another one at the Saudi HPC conference at King Abdulaziz University. 

Presenters