Some algorithms achieve optimal arithmetic complexity with low arithmetic intensity (flops/Byte), or possess high arithmetic intensity but lack optimal complexity, while some hierarchical algorithms, such as Fast Multipole and its H-matrix algebraic generalizations, realize a combination of optimal complexity and high intensity. Implemented with task-based dynamic runtime systems, such methods also have potential for relaxed synchrony, which is important for future energy-austere architectures, since there may be significant nonuniformity in processing rates of different cores even if task sizes can be controlled. We describe modules of KAUST's Hierarchical Computations on Manycore Architectures (HiCMA) software toolkit that illustrate these features and are intended as building blocks of more sophisticated applications, such as matrix-free higher-order methods in optimization.
HiCMA's target is hierarchical algorithms on emerging architectures, which have hierarchies of their own that generally do not align well with those of the algorithm. Some modules of this open source project have been adopted in the software libraries of major vendors. We describe what is currently available and some motivating applications.