KAUST student Saverio Pasqualoni receives ISC Hans Meuer Best Paper Award for leading research on an open-source framework
About
Improving how supercomputers communicate can significantly reduce the time required to train artificial intelligence models. Research led by KAUST M.S./Ph.D. student Saverio Pasqualoni shows that optimizing communication across large-scale systems can cut training time by up to 44%.
The research introduces PICO (Performance Insights for Collective Operations), an open-source framework that analyzes and improves communication across large-scale computing systems. The system’s fine-grained profiling, rich metadata collection and automated orchestration break down complex internal algorithmic phases that were previously opaque to engineers.
Selected as the winner from 122 submissions, the research earned the ISC High Performance 2026 (ISC26) Hans Meuer Best Paper Award. The work is a collaborative effort led by the computer science graduate student with contributions from his KAUST supervisor, Professor Marco Canini, and collaborators at ETH Zürich and Sapienza University of Rome.
Pasqualoni, a member of Canini’s Software-Defined Advanced Networked and Distributed Systems (SANDS) research group, will receive the award at ISC26. Held in Germany from June 22 to 26, the conference is a leading global event for high-performance computing (HPC), artificial intelligence, data analytics and quantum computing.
“Receiving the Hans Meuer Best Paper Award is an incredible honor for my SANDS colleagues and me,” he said. “PICO grew through discussions, feedback and support from colleagues across different institutions, and I am very grateful for that."
Making supercomputers communicate more efficiently
Supercomputers are not single machines but networks of smaller computers working together, making communication a key factor in overall performance. While some exchanges occur between individual nodes, many operations require large groups of processors to communicate simultaneously.
These collective operations are fundamental to high-performance computing, yet difficult to optimize. Default software configurations are designed to work reasonably well across many systems rather than being finely tuned for specific machines. As a result, communication inefficiencies can significantly limit performance.
“We created a way to measure and understand these differences in a systematic and reproducible way, going beyond the classic end-to-end metrics used by popular tools,” he said. “This allowed us to identify better tuning choices that improved performance on top-tier systems and showed that these improvements can measurably reduce AI training time.”
The work introduces a more detailed way to analyze how communication behaves within these systems, enabling direct comparison of collective algorithms across different supercomputers while maintaining reproducibility across software environments.
PICO’s development evolved across institutions. Early work at Sapienza University of Rome focused on building a practical framework for comparing algorithms, while further refinement at KAUST strengthened its scalability and adaptability.
A key technical challenge was ensuring that PICO remained portable, diagnostic and reproducible. The team aimed to build a tool that works across diverse systems and communication stacks without becoming a collection of machine-specific solutions.
Case studies demonstrate how PICO can uncover suboptimal default algorithm selections and guide library tuning. It highlights subtle performance trade-offs between closely related algorithms, quantifies the impact of backend parameters and identifies hidden bottlenecks through its backend-neutral implementations and instrumentation capabilities.
The future of collective communication
Pasqualoni describes PICO as a tool to open the "black box" of supercomputing. While many existing benchmarks identify that a collective operation is slow, they often fail to explain why. PICO functions like an X-ray, breaking down communication into its basic phases to pinpoint exactly where slowdowns occur. This diagnostic capability is essential for modern supercomputers, which are increasingly diverse in both hardware and software.
By exposing these hidden inefficiencies, the framework helps researchers extract more value from existing infrastructure without the need for new hardware. Pasqualoni notes that better tuning leads to more efficient use of large-scale systems, which can lower energy consumption and operational costs. These improvements extend beyond AI, benefiting traditional high-performance computing applications like climate modeling and genomics.
“If researchers and practitioners adopt it to study and tune collective communication in real systems, it can have a lasting impact,” he said. “By enabling better software insight and tuning, it can help extract more value from existing systems.
“I envision PICO contributing to KAUST’s strategy of aligning research with real-world impact and the Kingdom’s Vision 2030 priorities. I also hope this work encourages researchers to take a closer look at collective communication, where significant untapped performance potential remains.”