Event Start
Event End
Location
Abstract
Hierarchical matrices generalize globally low-rank matrix approximations, widely used in large-scale ML and scientific applications, to the hierarchically low-rank case where only blocks of the matrix admit low-rank approximations. Hierarchical matrix representations hold the promise to produce linear algebra algorithms that are asymptotically optimal both in operation and memory complexity, making them the key to a variety of scalable numerical simulation and optimization problems in practice. In this talk, we show that, besides their optimal O(N) algorithmic complexity, hierarchical matrix operations also benefit from parallel scalability on distributed machines with extremely large core counts. In particular, we describe high-performance, distributed-memory, GPU-accelerated algorithms for matrix-vector multiplication and other operations on hierarchical matrices in the H^2 format. This format takes advantage of a hierarchy of matrix blocks at different scales as well as a hierarchy of bases in which the low-rank matrix blocks are represented. Results show near-ideal scalability up to 1024 NVIDIA V100 GPUs, with performance exceeding 2.3 Pflop/s, on matrices of size exceeding half a billion unknowns.
Brief Biography
George Turkiyyah is currently with the Extreme Computing Research center and CEMSE, on leave from the American University of Beirut where he serves on the faculty of the Computer Science department. He obtained his BE from the American University of Beirut (AUB), MS, and Ph.D. from Carnegie Mellon University, and served on the faculty at the University of Washington before joining AUB. His research interests are in Computational Science broadly, with a recent special interest in scalable solvers for large-scale simulation and optimization problems. His work has received several awards and patents, has resulted in widely used simulation codes, and has led to a successful startup in Surgical Simulation.