Unstructured Computations on Emerging Architectures

PhD Dissertation Defense

Event Start

2019-04-23 - 13:00

Event End

2019-04-23 - 14:00

Location

B3, L5, Room 5209

Mohammed AlFarhan

Abstract

This dissertation describes detailed performance engineering and optimization of an unstructured computational aerodynamics software system with irregular memory accesses on a wide variety of multi- and many-core emerging high-performance computing scalable architectures, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic- and architecture-oriented optimizations are essential for achieving worthy performance. We investigate several state-of-the-practice shared-memory optimization techniques applied to key computational kernels for the important problem class of unstructured meshes, one of the seven Colella “dwarves,” which are essential for science and engineering. We illustrate for a broad-spectrum of emerging microprocessor architectures as representatives of the compute units in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating-point numerics of the original code. While the linear algebraic kernels are bottlenecked by memory bandwidth for even modest numbers of hardware cores sharing a common address space, the edge-based loop kernels, which arise in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, are compute-intensive and effectively exploit contemporary multi- and many-core processing hardware. We therefore employ low- and high-level algorithmic- and architecture-specific code optimizations and tuning in light of thread- and data-level parallelism, with a focus on strong thread scaling at the node-level. Our approaches are based upon novel multi-level hierarchical workload distribution mechanisms of data across different compute units (from the address space down to the registers) within every hardware core. We analyze application and its key computational routines on specific computing architectures, by which we develop certain performance metrics and models to bespeak the upper and lower bound of the performance on various back-end hardware platforms. We present significant full application speedup relative to the baseline code, on a range of many-core processor architectures, i.e., Intel Xeon Phi Knights Corner (5.0x) and Knights Landing (2.9x). In addition, the performance of Knights Landing outperforms, at significantly lower power consumption, Intel Xeon Skylake with nearly twofold speedup. These optimizations are expected to be of value for many other unstructured mesh partial differential equation-based scientific applications as multi- and many-core architecture evolves.

Brief Biography

Mohammed A. Al Farhan received the BS and MS degrees in computer science from KFU and KAUST, respectively, and has previously worked with the Saudi Electricity Company and Saudi Aramco as a software engineer. He is working toward the PhD degree in computer science at the KAUST Extreme Computing Research Center under the supervision of Professor David E. Keyes. His research interests include HPC architectures and applications in computational science and engineering. He is a member of the SIAM, ACM, and IEEE.

Contact Person

David Keyes

Related Persons

David Keyes

Professor, Applied Mathematics and Computational Sciences

Professors

Event Start

Event End

Location

Abstract

Brief Biography

Contact Person

Related Persons

David Keyes

Events

Advancing Monocular Depth Estimation: Novel Formulations and Applications

CEMSE - Computer, Electrical and Mathematical Sciences and Engineering Division

Biological and Environmental Sciences Engineering Division

Physical Science and Engineering Division

Study

Expanding Knowledge

Student Affairs

Living in KAUST

About KAUST

Latest from KAUST

Computer Science Program