- Develop and demonstrate algorithms and software for data analytics and simulation in the emerging exascale era
- Raise the impact and increase the reach of big data analytics and large-scale simulation at KAUST
- Foster the convergence of analytics and simulation for the benefit of each – making each more predictive and more performant.
The components of this three-fold mission are strongly inter-related. Today’s “commodity” petascale computations are typically bottlenecked by the same issue that needs to be addressed when migrating today’s software infrastructure to the exascale: the imbalance between arithmetic processing and data motion. More than five decades of Moore’s Law scaling have made processing costs in energy and time small compared to the costs in energy and time of delivering data to the processor and returning it to memory. Memory bandwidth per computational core and memory capacity per computational core are decreasing, making typical computations memory-limited and threatening to allow little of the potential to be realized when today’s software is ported to exascale systems. While the ECRC delivers components of infrastructure motivated by exascale, they are first placed into service in today’s petascale applications, so that the latter deliver improved performance on computers like Shaheen-2 and its anticipated upgrade. By the time Shaheen-3 arrives in Saudi Arabia, some KAUST applications will be positioned to take maximum advantage of it. In fact, KAUST applications (and those developed with collaborators such as Saudi Aramco) will be among the leading motivators for investing in next-generation computing in the first place.
The convergence of the “third paradigm” of simulation and the “fourth paradigm” of big data is upon us. The ECRC aims to incorporate the fruits of machine learning into simulation models to replace or augment empirical models to achieve greater predictivity in the application and into algorithmic tuning for greater performance. It also intends to incorporate the fruits of high performance computing into big data applications to enable them to expand into distributed memory and to supplement experimentally derived data. Simulation and data are already combined today in data assimilation and inverse problems but these are not “learning” applications. These are early manifestations of the in situ convergence we intend to enable.
The rationale for a center dedicated to algorithms and software infrastructure derives from the nature of the scientific software stack. Most software for “open science” (that is, nonproprietary code employed by academic and many national laboratory and industrial users) is modular and layered, with the structure of an hourglass. At the top of the hourglass are many applications in simulation and analytics of a scale requiring high performance computing, such as combustion modeling or finding repeated motifs in large graphs. At the bottom of the hourglass are many particular instances of hardware, such as GPUs and manycore systems from a variety of vendors over a mix of generations. In the middle, at the narrow waist of the hourglass, are software components common to many applications, such as applying a computational “stencil” to the cells of a mesh or the pixels of an image, or solving a large sparse or dense system of linear equations, or finding the eigenmodes or singular values of an operator such as a Hamiltonian or a Hessian. In addition to algorithms, software implementing parallel programming models, such as message-passing or task-based graphs, is at the narrow waist of the hourglass. The abstract interfaces of these programming models efficiently translate the requirements of a universe of diverse applications above to a universe of architectures below. The ECRC focuses its primary attention at the waist of the hourglass, providing functionality that diverse computational applications at KAUST can call upon through a common abstraction. At the base of the ECRC hourglass are newly emergent processor architectures from vendors such as Intel, NVIDIA, AMD, and IBM.
Further rationale for the ECRC is that the software infrastructure at the waist of the hourglass is, itself, layered, in that one component calls others. For instance, the ECRC statistics package ExaGeoStat calls various subroutines of the ECRC linear algebra package HiCMA. There are rich interdependencies between and within these packages. The center provides a critical mass of expertise so that the value generated by improving one component can be leveraged by all components above it, and such leveraging can occur without having to go outside of KAUST. Furthermore, in a center context, the requirements of a higher level in the software tool-chain can prioritize development of lower levels, reducing the dependence of KAUST deliverables on outside collaborators.
Globally, the ECRC is seated at the table of G-7 efforts to migrate today’s scientific software infrastructure to the exascale. We undertake tasks that have external dependencies and we push our accomplishments proactively into worthy external applications. However, the ECRC represents an opportunity for coordination of KAUST’s investment to be leveraged first and foremost within KAUST. Investments coordinated within a center structure can assist each other to pay off. For instance, seismic inversion benefits from the availability of a Helmholtz solver based on hierarchical low rank compression. Conversely, an achievement in Helmholtz solvers is more credible and leads to a higher impact publication if immediately demonstrated in the context of seismic inversion. Moreover, having a specific challenging application in sight lowers the probability that a significant investment will be made in a less productive “corner” of the software infrastructure space. For this reason, infrastructure and applications are packaged together within this center proposal. The applications help the ECRC fulfill its immediate relevance in the Saudi Arabian context of outreach and technology transfer, while the infrastructure brings the ECRC international visibility.