2022

Alomairy, R., Bader, W., Ltaief, H., Mesri, Y., & Keyes, D. (2022). High-performance 3D Unstructured Mesh Deformation Using Rank Structured Matrix Computations. ACM Transactions on Parallel Computing, 9(1), 1–23. https://doi.org/10.1145/3512756

2020

Alturkestani, T., Ltaief, H., & Keyes, D. (2020). Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems. Lecture Notes in Computer Science, 263–278. https://doi.org/10.1007/978-3-030-57675-2_17
Cao, Q., Pei, Y., Akbudak, K., Mikhalev, A., Bosilca, G., Ltaief, H., … Dongarra, J. (2020). Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications. Proceedings of the Platform for Advanced Scientific Computing Conference. https://doi.org/10.1145/3394277.3401846
Al-Harthi, N., Alomairy, R., Akbudak, K., Chen, R., Ltaief, H., Bagci, H., & Keyes, D. (2020). Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization. High Performance Computing, 209–229. https://doi.org/10.1007/978-3-030-50743-5_11
Akbudak, K., Ltaief, H., Etienne, V., Abdelkhalak, R., Tonellot, T., & Keyes, D. (2020). Asynchronous computations for solving the acoustic wave propagation equation. The International Journal of High Performance Computing Applications, 34(4), 377–393. https://doi.org/10.1177/1094342020923027
Alomairy, R., Ltaief, H., Abduljabbar, M., & Keyes, D. (2020). Abstraction Layer For Standardizing APIs of Task-Based Engines. IEEE Transactions on Parallel and Distributed Systems, 31(11), 2482–2495. https://doi.org/10.1109/tpds.2020.2992923
Alturkestani, T., Tonellot, T., Ltaief, H., Abdelkhalak, R., Etienne, V., & Keyes, D. (2019). MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). https://doi.org/10.1109/hipc.2019.00046
Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2019). Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). https://doi.org/10.1109/hipc.2019.00028
Keyes, D. E., Ltaief, H., & Turkiyyah, G. (2020). Hierarchical algorithms on hierarchical architectures. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 378(2166), 20190055. https://doi.org/10.1098/rsta.2019.0055
Cao, Q., Pei, Y., Herauldt, T., Akbudak, K., Mikhalev, A., Bosilca, G., … Dongarra, J. (2019). Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools. 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools). https://doi.org/10.1109/protools49597.2019.00009

2019

Doucet, N., Ltaief, H., Gratadour, D., & Keyes, D. (2019). Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators. 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3). https://doi.org/10.1109/ia349570.2019.00011
Sukkari, D., Ltaief, H., Keyes, D., & Faverge, M. (2019). Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems. 2019 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2019.8891024
AlOnazi, A., Ltaief, H., Keyes, D., Said, I., & Thibault, S. (2019). Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry. 2019 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2019.8891054
Ltaief, H., Sukkari, D., Esposito, A., Nakatsukasa, Y., & Keyes, D. (2019). Massively Parallel Polar Decomposition on Distributed-memory Systems. ACM Transactions on Parallel Computing, 6(1), 1–15. https://doi.org/10.1145/3328723
Charara, A., Keyes, D., & Ltaief, H. (2019). Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs. ACM Transactions on Mathematical Software, 45(2), 1–28. https://doi.org/10.1145/3267101
Sukkari, D., Ltaief, H., Esposito, A., & Keyes, D. (2019). A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems. ACM Transactions on Mathematical Software, 45(2), 1–21. https://doi.org/10.1145/3309548
Abdelkhalak, R., Akbudak, K., Etienne, V., Ltaief, H., Tonellot, T., & Keyes, D. (2019). Application of High Performance Asynchronous Acoustic Wave Equation Stencil Solver into a Land Survey. SPE Middle East Oil and Gas Show and Conference. https://doi.org/10.2118/194722-ms

2018

Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2018). Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations. 2018 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2018.00089
Ltaief, H., Charara, A., Gratadour, D., Doucet, N., Hadri, B., Gendron, E., … Keyes, D. (2018). Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/ipdps.2018.00018
Akbudak, K., Ltaief, H., Mikhalev, A., Charara, A., Esposito, A., & Keyes, D. (2018). Exploiting Data Sparsity for Large-Scale Matrix Computations. Lecture Notes in Computer Science, 721–734. https://doi.org/10.1007/978-3-319-96983-1_51
Charara, A., Keyes, D., & Ltaief, H. (2018). Tile Low-Rank GEMM Using Batched Operations on GPUs. Lecture Notes in Computer Science, 811–825. https://doi.org/10.1007/978-3-319-96983-1_57
Doucet, N., Gratadour, D., Ltaief, H., Kriemann, R., Gendron, E., & Keyes, D. (2018). Scalable soft real-time supervisor for tomographic AO. Adaptive Optics Systems VI. https://doi.org/10.1117/12.2313273
Ltaief, H., Sukkari, D., Guyon, O., & Keyes, D. (2018). Extreme Computing for Extreme Adaptive Optics. Proceedings of the Platform for Advanced Scientific Computing Conference. https://doi.org/10.1145/3218176.3218225
Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2018). ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems. IEEE Transactions on Parallel and Distributed Systems, 29(12), 2771–2784. https://doi.org/10.1109/tpds.2018.2850749

2017

Malas, T. M., Hager, G., Ltaief, H., & Keyes, D. E. (2018). Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations. ACM Transactions on Parallel Computing, 4(3), 1–32. https://doi.org/10.1145/3155290
Chávez, G., Turkiyyah, G., Zampini, S., Ltaief, H., & Keyes, D. (2018). Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems. Parallel Computing, 74, 65–83. https://doi.org/10.1016/j.parco.2017.12.001
Sukkari, D., Ltaief, H., Faverge, M., & Keyes, D. (2018). Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures. IEEE Transactions on Parallel and Distributed Systems, 29(2), 312–323. https://doi.org/10.1109/tpds.2017.2755655
Boukaram, W. H., Turkiyyah, G., Ltaief, H., & Keyes, D. E. (2018). Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression. Parallel Computing, 74, 19–33. https://doi.org/10.1016/j.parco.2017.09.001
Charara, A., Keyes, D., & Ltaief, H. (2017). A framework for dense triangular matrix kernels on various manycore architectures. Concurrency and Computation: Practice and Experience, 29(15), e4187. https://doi.org/10.1002/cpe.4187
Akbudak, K., Ltaief, H., Mikhalev, A., & Keyes, D. (2017). Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures. High Performance Computing, 22–40. https://doi.org/10.1007/978-3-319-58667-0_2
Unat, D., Dubey, A., Hoefler, T., Shalf, J., Abraham, M., Bianco, M., … Pericas, M. (2017). Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems, 28(10), 3007–3020. https://doi.org/10.1109/tpds.2017.2703149

2016

Chen, Y., Keyes, D., Law, K. J. H., & Ltaief, H. (2016). Accelerated Dimension-Independent Adaptive Metropolis. SIAM Journal on Scientific Computing, 38(5), S539–S565. https://doi.org/10.1137/15m1026432
Sukkari, D., Ltaief, H., & Keyes, D. (2016). A High Performance QDWH-SVD Solver Using Hardware Accelerators. ACM Transactions on Mathematical Software, 43(1), 1–25. https://doi.org/10.1145/2894747
Sukkari, D., Ltaief, H., & Keyes, D. (2016). High Performance Polar Decomposition on Distributed Memory Systems. Lecture Notes in Computer Science, 605–616. https://doi.org/10.1007/978-3-319-43659-3_44
Charara, A., Ltaief, H., & Keyes, D. (2016). Redesigning Triangular Dense Matrix Computations on GPUs. Lecture Notes in Computer Science, 477–489. https://doi.org/10.1007/978-3-319-43659-3_35
Malas, T. M., Hornich, J., Hager, G., Ltaief, H., Pflaum, C., & Keyes, D. E. (2016). Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/ipdps.2016.87
Ltaief, H., Gratadour, D., Charara, A., & Gendron, E. (2016). Adaptive Optics Simulation for the World’s Largest Telescope on Multicore Architectures with Multiple GPUs. Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC ’16. https://doi.org/10.1145/2929908.2929920
Arfaoui, M.-A., Ltaief, H., Rezki, Z., Alouini, M.-S., & Keyes, D. (2016). Efficient Sphere Detector Algorithm for Massive MIMO Using GPU Hardware Accelerator. Procedia Computer Science, 80, 2169–2180. https://doi.org/10.1016/j.procs.2016.05.377
Abdelfattah, A., Ltaief, H., Keyes, D., & Dongarra, J. (2016). Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs. Concurrency and Computation: Practice and Experience, 28(12), 3447–3465. https://doi.org/10.1002/cpe.3874
Abdelfattah, A., Keyes, D., & Ltaief, H. (2016). KBLAS. ACM Transactions on Mathematical Software, 42(3), 1–31. https://doi.org/10.1145/2818311

2015

Malas, T., Hager, G., Ltaief, H., & Keyes, D. (2015). Towards Fast Reverse Time Migration Kernels using Multi-threaded Wavefront Diamond Tiling. Second EAGE Workshop on High Performance Computing for Upstream. https://doi.org/10.3997/2214-4609.201414025
Abdelfattah, A., Ltaief, H., & Keyes, D. (2015). High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications. Euro-Par 2015: Parallel Processing, 601–612. https://doi.org/10.1007/978-3-662-48096-0_46
Al-Omairy, R., Miranda, G., Ltaief, H., Badia, R., Martorell, X., Labarta, J., & Keyes, D. (2015). Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing. (2015). Supercomputing Frontiers and Innovations, 2(1). https://doi.org/10.14529/jsfi150103
Malas, T., Hager, G., Ltaief, H., Stengel, H., Wellein, G., & Keyes, D. (2015). Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates. SIAM Journal on Scientific Computing, 37(4), C439–C464. https://doi.org/10.1137/140991133
Charara, A., Ltaief, H., Gratadour, D., Keyes, D., Sevin, A., Abdelfattah, A., … Vidal, F. (2014). Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/sc.2014.27

2014

Abdelfattah, A., Gendron, E., Gratadour, D., Keyes, D., Ltaief, H., Sevin, A., & Vidal, F. (2014). High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems. Euro-Par 2014 Parallel Processing, 704–715. https://doi.org/10.1007/978-3-319-09873-9_59
Gendron, É., Charara, A., Abdelfattah, A., Gratadour, D., Keyes, D., Ltaief, H., … Rousset, G. (2014). A novel fast and accurate pseudo-analytical simulation approach for MOAO. Adaptive Optics Systems IV. https://doi.org/10.1117/12.2055911

2013

Dongarra, J., Faverge, M., Ltaief, H., & Luszczek, P. (2013). Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurrency and Computation: Practice and Experience, 26(7), 1408–1431. https://doi.org/10.1002/cpe.3110
Ltaief, H., & Yokota, R. (2013). Data-driven execution of fast multipole methods. Concurrency and Computation: Practice and Experience, 26(11), 1935–1946. https://doi.org/10.1002/cpe.3132
Abdelfattah, A., Dongarra, J., Keyes, D., & Ltaief, H. (2013). Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators. High Performance Computing for Computational Science - VECPAR 2012, 72–79. https://doi.org/10.1007/978-3-642-38718-0_10
Ltaief, H., Luszczek, P., & Dongarra, J. (2013). High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. ACM Transactions on Mathematical Software, 39(3), 1–22. https://doi.org/10.1145/2450153.2450154
Dongarra, J., Ltaief, H., Luszczek, P., & Weaver, V. M. (2012). Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures. 2012 Second International Conference on Cloud and Green Computing. https://doi.org/10.1109/cgc.2012.113
Abdelfattah, A., Keyes, D., & Ltaief, H. (2013). Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU. Euro-Par 2012: Parallel Processing Workshops, 207–216. https://doi.org/10.1007/978-3-642-36949-0_23

2012

Haidar, A., Ltaief, H., & Dongarra, J. (2012). Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem. SIAM Journal on Scientific Computing, 34(6), C249–C274. https://doi.org/10.1137/110823699
Bosilca, G., Ltaief, H., & Dongarra, J. (2012). Power profiling of Cholesky and QR factorizations on distributed memory systems. Computer Science - Research and Development, 29(2), 139–147. https://doi.org/10.1007/s00450-012-0224-2
Haidar, A., Ltaief, H., Luszczek, P., & Dongarra, J. (2012). A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction. 2012 IEEE 26th International Parallel and Distributed Processing Symposium. https://doi.org/10.1109/ipdps.2012.13
Ltaief, H., Luszczek, P., & Dongarra, J. (2012). Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction. Lecture Notes in Computer Science, 661–670. https://doi.org/10.1007/978-3-642-31464-3_67
Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., & Tomov, S. (2011). LU factorization for accelerator-based systems. 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). https://doi.org/10.1109/aiccsa.2011.6126599
Dongarra, J., Faverge, M., Ltaief, H., & Luszczek, P. (2011). High performance matrix inversion based on LU factorization for multicore architectures. Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers - MTAGS ’11. https://doi.org/10.1145/2132876.2132885

2011

Haidar, A., Ltaief, H., & Dongarra, J. (2011). Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’11. https://doi.org/10.1145/2063384.2063394
Ltaief, H., Luszczek, P., & Dongarra, J. (2011). Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. Computer Science - Research and Development, 27(4), 277–287. https://doi.org/10.1007/s00450-011-0191-z
Haidar, A., Ltaief, H., YarKhan, A., & Dongarra, J. (2011). Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and Computation: Practice and Experience, 24(3), 305–321. https://doi.org/10.1002/cpe.1829