Monday, June 20, 2022, 11:00
- 13:00
Building 9, Level 4, Room 4223
Contact Person
Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and reduce their execution time. Sometimes, even forming the dense matrix can be a bottleneck – in computation or storage.
Bilel Hadri, Computational Scientist, Supercomputing Lab, KAUST
Friday, July 02, 2021, 14:00
- 18:00
ISC21 (virtual), Frankfurt, Germany (Time CET)
Contact Person

Abstract

With the hardware technology scaling and the trend on heterogeneous chip design, the exis

Piotr Luszczek, Research Assistant Professor, University of Tennessee
Monday, March 01, 2021, 09:00
- 18:00
vFairs online platform (SIAM CSE21 registration required)
Contact Person

Abstract

This minisymposium brings together experts in numerical simulation that have developed HP

Thursday, October 08, 2020, 12:00
- 13:00
KAUST
We present Exascale GeoStatistics (ExaGeoStat) software, a high-performance library implemented on a wide variety of contemporary hybrid distributed-shared supercomputers whose primary target is climate and environmental prediction applications.
Thursday, July 09, 2020, 16:00
- 17:00
KAUST
Contact Person
Out-of-Core simulation systems often produce a massive amount of data that cannot fit on the aggregate fast memory of the compute nodes, and they also require to read back these data for computation. As a result, I/O data movement can be a bottleneck in large-scale simulations. Advances in memory architecture have made it feasible to integrate hierarchical storage media on large-scale systems, starting from the traditional Parallel File Systems to intermediate fast disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst Buffers) and up to CPU’s main memory and GPU’s High Bandwidth Memory. However, while adding additional and faster storage media increases I/O bandwidth, it pressures the CPU, as it becomes responsible for managing and moving data between these layers of storage. Simulation systems are thus vulnerable to being blocked by I/O operations. The Multilayer Buffer System (MLBS) proposed in this research demonstrates a general method for overlapping I/O with computation that helps to ameliorate the strain on the processors through asynchronous access. The main idea consists in decoupling I/O operations from computational phases using dedicated hardware resources to perform expensive context switches. By continually prefetching up and down across all hardware layers of the memory/storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated applications and shifts it closer to a memory-bound or compute-bound regime.
Wednesday, December 11, 2019, 16:00
- 17:00
Building 2, Level 5, Room 5220
Contact Person
The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to provide fundamental dense linear algebra capabilities for current and upcoming distributed high-performance systems, both accelerated CPU–GPU based and CPU based.