Publications | HiCMA | Hierarchical Computations on Manycore Architectures

2022

Alomairy, R., Bader, W., Ltaief, H., Mesri, Y., & Keyes, D. (2022). High-performance 3D Unstructured Mesh Deformation Using Rank Structured Matrix Computations. ACM Transactions on Parallel Computing, 9(1), 1–23. https://doi.org/10.1145/3512756

Handle

10754/663637

DOI

10.1145/3512756

Altmetrics

2020

Alturkestani, T., Ltaief, H., & Keyes, D. (2020). Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems. Lecture Notes in Computer Science, 263–278. https://doi.org/10.1007/978-3-030-57675-2_17

Handle

10754/665194

DOI

10.1007/978-3-030-57675-2_17

Altmetrics

Cao, Q., Pei, Y., Akbudak, K., Mikhalev, A., Bosilca, G., Ltaief, H., … Dongarra, J. (2020). Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications. Proceedings of the Platform for Advanced Scientific Computing Conference. https://doi.org/10.1145/3394277.3401846

Handle

10754/656453

DOI

10.1145/3394277.3401846

Altmetrics

Al-Harthi, N., Alomairy, R., Akbudak, K., Chen, R., Ltaief, H., Bagci, H., & Keyes, D. (2020). Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization. High Performance Computing, 209–229. https://doi.org/10.1007/978-3-030-50743-5_11

Handle

10754/663755

DOI

10.1007/978-3-030-50743-5_11

Altmetrics

Akbudak, K., Ltaief, H., Etienne, V., Abdelkhalak, R., Tonellot, T., & Keyes, D. (2020). Asynchronous computations for solving the acoustic wave propagation equation. The International Journal of High Performance Computing Applications, 34(4), 377–393. https://doi.org/10.1177/1094342020923027

Handle

10754/662949

DOI

10.1177/1094342020923027

Altmetrics

Alomairy, R., Ltaief, H., Abduljabbar, M., & Keyes, D. (2020). Abstraction Layer For Standardizing APIs of Task-Based Engines. IEEE Transactions on Parallel and Distributed Systems, 31(11), 2482–2495. https://doi.org/10.1109/tpds.2020.2992923

Handle

10754/656693

DOI

10.1109/TPDS.2020.2992923

Altmetrics

Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2019). Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). https://doi.org/10.1109/hipc.2019.00028

Handle

10754/666233

DOI

10.1109/HiPC.2019.00028

Altmetrics

Alturkestani, T., Tonellot, T., Ltaief, H., Abdelkhalak, R., Etienne, V., & Keyes, D. (2019). MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). https://doi.org/10.1109/hipc.2019.00046

Handle

10754/660579

DOI

10.1109/HiPC.2019.00046

Altmetrics

Keyes, D. E., Ltaief, H., & Turkiyyah, G. (2020). Hierarchical algorithms on hierarchical architectures. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 378(2166), 20190055. https://doi.org/10.1098/rsta.2019.0055

Handle

10754/661111

DOI

10.1098/rsta.2019.0055

Altmetrics

Cao, Q., Pei, Y., Herauldt, T., Akbudak, K., Mikhalev, A., Bosilca, G., … Dongarra, J. (2019). Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools. 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools). https://doi.org/10.1109/protools49597.2019.00009

Handle

10754/661884

DOI

10.1109/ProTools49597.2019.00009

Altmetrics

2019

Doucet, N., Ltaief, H., Gratadour, D., & Keyes, D. (2019). Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators. 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3). https://doi.org/10.1109/ia349570.2019.00011

Handle

10754/661046

DOI

10.1109/IA349570.2019.00011

Altmetrics

Sukkari, D., Ltaief, H., Keyes, D., & Faverge, M. (2019). Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems. 2019 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2019.8891024

Handle

10754/660619

DOI

10.1109/CLUSTER.2019.8891024

Altmetrics

AlOnazi, A., Ltaief, H., Keyes, D., Said, I., & Thibault, S. (2019). Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry. 2019 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2019.8891054

Handle

10754/660618

DOI

10.1109/CLUSTER.2019.8891054

Altmetrics

Ltaief, H., Sukkari, D., Esposito, A., Nakatsukasa, Y., & Keyes, D. (2019). Massively Parallel Polar Decomposition on Distributed-memory Systems. ACM Transactions on Parallel Computing, 6(1), 1–15. https://doi.org/10.1145/3328723

Handle

10754/626359

DOI

10.1145/3328723

Altmetrics

Charara, A., Keyes, D., & Ltaief, H. (2019). Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs. ACM Transactions on Mathematical Software, 45(2), 1–28. https://doi.org/10.1145/3267101

Handle

10754/622975

DOI

10.1145/3267101

Altmetrics

Sukkari, D., Ltaief, H., Esposito, A., & Keyes, D. (2019). A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems. ACM Transactions on Mathematical Software, 45(2), 1–21. https://doi.org/10.1145/3309548

Handle

10754/626212

DOI

10.1145/3309548

Altmetrics

Abdelkhalak, R., Akbudak, K., Etienne, V., Ltaief, H., Tonellot, T., & Keyes, D. (2019). Application of High Performance Asynchronous Acoustic Wave Equation Stencil Solver into a Land Survey. SPE Middle East Oil and Gas Show and Conference. https://doi.org/10.2118/194722-ms

Handle

10754/631705

DOI

10.2118/194722-ms

Altmetrics

2018

Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2018). Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations. 2018 IEEE International Conference on Cluster Computing (CLUSTER). https://doi.org/10.1109/cluster.2018.00089

Handle

10754/630314

DOI

10.1109/cluster.2018.00089

Altmetrics

Ltaief, H., Charara, A., Gratadour, D., Doucet, N., Hadri, B., Gendron, E., … Keyes, D. (2018). Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/ipdps.2018.00018

Handle

10754/628816

DOI

10.1109/IPDPS.2018.00018

Altmetrics

Charara, A., Keyes, D., & Ltaief, H. (2018). Tile Low-Rank GEMM Using Batched Operations on GPUs. Lecture Notes in Computer Science, 811–825. https://doi.org/10.1007/978-3-319-96983-1_57

DOI

10.1007/978-3-319-96983-1_57

Altmetrics

Akbudak, K., Ltaief, H., Mikhalev, A., Charara, A., Esposito, A., & Keyes, D. (2018). Exploiting Data Sparsity for Large-Scale Matrix Computations. Lecture Notes in Computer Science, 721–734. https://doi.org/10.1007/978-3-319-96983-1_51

DOI

10.1007/978-3-319-96983-1_51

Altmetrics

Doucet, N., Gratadour, D., Ltaief, H., Kriemann, R., Gendron, E., & Keyes, D. (2018). Scalable soft real-time supervisor for tomographic AO. Adaptive Optics Systems VI. https://doi.org/10.1117/12.2313273

Handle

10754/631517

DOI

10.1117/12.2313273

Altmetrics

Ltaief, H., Sukkari, D., Guyon, O., & Keyes, D. (2018). Extreme Computing for Extreme Adaptive Optics. Proceedings of the Platform for Advanced Scientific Computing Conference. https://doi.org/10.1145/3218176.3218225

Handle

10754/627414

DOI

10.1145/3218176.3218225

Altmetrics

Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G., & Keyes, D. E. (2018). ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems. IEEE Transactions on Parallel and Distributed Systems, 29(12), 2771–2784. https://doi.org/10.1109/tpds.2018.2850749

Handle

10754/628384

DOI

10.1109/tpds.2018.2850749

Altmetrics

2017

Malas, T. M., Hager, G., Ltaief, H., & Keyes, D. E. (2018). Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations. ACM Transactions on Parallel Computing, 4(3), 1–32. https://doi.org/10.1145/3155290

Handle

10754/631616

DOI

10.1145/3155290

Altmetrics

Chávez, G., Turkiyyah, G., Zampini, S., Ltaief, H., & Keyes, D. (2018). Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems. Parallel Computing, 74, 65–83. https://doi.org/10.1016/j.parco.2017.12.001

Handle

10754/626403

DOI

10.1016/j.parco.2017.12.001

Altmetrics

Sukkari, D., Ltaief, H., Faverge, M., & Keyes, D. (2018). Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures. IEEE Transactions on Parallel and Distributed Systems, 29(2), 312–323. https://doi.org/10.1109/tpds.2017.2755655

Handle

10754/625885

DOI

10.1109/tpds.2017.2755655

Altmetrics

Boukaram, W. H., Turkiyyah, G., Ltaief, H., & Keyes, D. E. (2018). Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression. Parallel Computing, 74, 19–33. https://doi.org/10.1016/j.parco.2017.09.001

Handle

10754/625473

DOI

10.1016/j.parco.2017.09.001

Altmetrics

Charara, A., Keyes, D., & Ltaief, H. (2017). A framework for dense triangular matrix kernels on various manycore architectures. Concurrency and Computation: Practice and Experience, 29(15), e4187. https://doi.org/10.1002/cpe.4187

Handle

10754/622077

DOI

10.1002/cpe.4187

Altmetrics

Akbudak, K., Ltaief, H., Mikhalev, A., & Keyes, D. (2017). Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures. High Performance Computing, 22–40. https://doi.org/10.1007/978-3-319-58667-0_2

Handle

10754/625590

DOI

10.1007/978-3-319-58667-0_2

Altmetrics

Unat, D., Dubey, A., Hoefler, T., Shalf, J., Abraham, M., Bianco, M., … Pericas, M. (2017). Trends in Data Locality Abstractions for HPC Systems. IEEE Transactions on Parallel and Distributed Systems, 28(10), 3007–3020. https://doi.org/10.1109/tpds.2017.2703149

Handle

10754/625984

DOI

10.1109/TPDS.2017.2703149

Altmetrics

2016

Chen, Y., Keyes, D., Law, K. J. H., & Ltaief, H. (2016). Accelerated Dimension-Independent Adaptive Metropolis. SIAM Journal on Scientific Computing, 38(5), S539–S565. https://doi.org/10.1137/15m1026432

Handle

10754/621839

DOI

10.1137/15m1026432

Altmetrics

Sukkari, D., Ltaief, H., & Keyes, D. (2016). A High Performance QDWH-SVD Solver Using Hardware Accelerators. ACM Transactions on Mathematical Software, 43(1), 1–25. https://doi.org/10.1145/2894747

Handle

10754/348632

DOI

10.1145/2894747

Altmetrics

Charara, A., Ltaief, H., & Keyes, D. (2016). Redesigning Triangular Dense Matrix Computations on GPUs. Lecture Notes in Computer Science, 477–489. https://doi.org/10.1007/978-3-319-43659-3_35

Handle

10754/621824

DOI

10.1007/978-3-319-43659-3_35

Altmetrics

Sukkari, D., Ltaief, H., & Keyes, D. (2016). High Performance Polar Decomposition on Distributed Memory Systems. Lecture Notes in Computer Science, 605–616. https://doi.org/10.1007/978-3-319-43659-3_44

Handle

10754/622144

DOI

10.1007/978-3-319-43659-3_44

Altmetrics

Malas, T. M., Hornich, J., Hager, G., Ltaief, H., Pflaum, C., & Keyes, D. E. (2016). Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization. 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/ipdps.2016.87

Handle

10754/622653

DOI

10.1109/ipdps.2016.87

Altmetrics

Ltaief, H., Gratadour, D., Charara, A., & Gendron, E. (2016). Adaptive Optics Simulation for the World’s Largest Telescope on Multicore Architectures with Multiple GPUs. Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC ’16. https://doi.org/10.1145/2929908.2929920

Handle

10754/622511

DOI

10.1145/2929908.2929920

Altmetrics

Arfaoui, M.-A., Ltaief, H., Rezki, Z., Alouini, M.-S., & Keyes, D. (2016). Efficient Sphere Detector Algorithm for Massive MIMO Using GPU Hardware Accelerator. Procedia Computer Science, 80, 2169–2180. https://doi.org/10.1016/j.procs.2016.05.377

Handle

10754/613008

DOI

10.1016/j.procs.2016.05.377

Altmetrics

Abdelfattah, A., Ltaief, H., Keyes, D., & Dongarra, J. (2016). Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs. Concurrency and Computation: Practice and Experience, 28(12), 3447–3465. https://doi.org/10.1002/cpe.3874

Handle

10754/621728

DOI

10.1002/cpe.3874

Altmetrics

Abdelfattah, A., Keyes, D., & Ltaief, H. (2016). KBLAS. ACM Transactions on Mathematical Software, 42(3), 1–31. https://doi.org/10.1145/2818311

Handle

10754/621727

DOI

10.1145/2818311

Altmetrics

2015

Malas, T., Hager, G., Ltaief, H., & Keyes, D. (2015). Towards Fast Reverse Time Migration Kernels using Multi-threaded Wavefront Diamond Tiling. Second EAGE Workshop on High Performance Computing for Upstream. https://doi.org/10.3997/2214-4609.201414025

Handle

10754/578819

DOI

10.3997/2214-4609.201414025

Altmetrics

Abdelfattah, A., Ltaief, H., & Keyes, D. (2015). High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications. Euro-Par 2015: Parallel Processing, 601–612. https://doi.org/10.1007/978-3-662-48096-0_46

Handle

10754/565820

DOI

10.1007/978-3-662-48096-0_46

Altmetrics

Malas, T., Hager, G., Ltaief, H., Stengel, H., Wellein, G., & Keyes, D. (2015). Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates. SIAM Journal on Scientific Computing, 37(4), C439–C464. https://doi.org/10.1137/140991133

Handle

10754/577336

DOI

10.1137/140991133

Altmetrics

Al-Omairy, R., Miranda, G., Ltaief, H., Badia, R., Martorell, X., Labarta, J., & Keyes, D. (2015). Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing. (2015). Supercomputing Frontiers and Innovations, 2(1). https://doi.org/10.14529/jsfi150103

DOI

10.14529/jsfi150103

Altmetrics

Charara, A., Ltaief, H., Gratadour, D., Keyes, D., Sevin, A., Abdelfattah, A., … Vidal, F. (2014). Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System. SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/sc.2014.27

Handle

10754/575827

DOI

10.1109/sc.2014.27

Altmetrics

2014

Abdelfattah, A., Gendron, E., Gratadour, D., Keyes, D., Ltaief, H., Sevin, A., & Vidal, F. (2014). High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems. Euro-Par 2014 Parallel Processing, 704–715. https://doi.org/10.1007/978-3-319-09873-9_59

Handle

10754/564877

DOI

10.1007/978-3-319-09873-9_59

Altmetrics

Gendron, É., Charara, A., Abdelfattah, A., Gratadour, D., Keyes, D., Ltaief, H., … Rousset, G. (2014). A novel fast and accurate pseudo-analytical simulation approach for MOAO. Adaptive Optics Systems IV. https://doi.org/10.1117/12.2055911

Handle

10754/346823

DOI

10.1117/12.2055911

Altmetrics

2013

Dongarra, J., Faverge, M., Ltaief, H., & Luszczek, P. (2013). Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting. Concurrency and Computation: Practice and Experience, 26(7), 1408–1431. https://doi.org/10.1002/cpe.3110

Handle

10754/575581

DOI

10.1002/cpe.3110

Altmetrics

Ltaief, H., & Yokota, R. (2013). Data-driven execution of fast multipole methods. Concurrency and Computation: Practice and Experience, 26(11), 1935–1946. https://doi.org/10.1002/cpe.3132

Handle

10754/562978

DOI

10.1002/cpe.3132

Altmetrics

Abdelfattah, A., Dongarra, J., Keyes, D., & Ltaief, H. (2013). Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators. High Performance Computing for Computational Science - VECPAR 2012, 72–79. https://doi.org/10.1007/978-3-642-38718-0_10

Handle

10754/564662

DOI

10.1007/978-3-642-38718-0_10

Altmetrics

Ltaief, H., Luszczek, P., & Dongarra, J. (2013). High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. ACM Transactions on Mathematical Software, 39(3), 1–22. https://doi.org/10.1145/2450153.2450154

Handle

10754/575572

DOI

10.1145/2450153.2450154

Altmetrics

Dongarra, J., Ltaief, H., Luszczek, P., & Weaver, V. M. (2012). Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures. 2012 Second International Conference on Cloud and Green Computing. https://doi.org/10.1109/cgc.2012.113

Handle

10754/575808

DOI

10.1109/cgc.2012.113

Altmetrics

Abdelfattah, A., Keyes, D., & Ltaief, H. (2013). Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU. Euro-Par 2012: Parallel Processing Workshops, 207–216. https://doi.org/10.1007/978-3-642-36949-0_23

Handle

10754/564656

DOI

10.1007/978-3-642-36949-0_23

Altmetrics

2012

Haidar, A., Ltaief, H., & Dongarra, J. (2012). Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem. SIAM Journal on Scientific Computing, 34(6), C249–C274. https://doi.org/10.1137/110823699

Handle

10754/555650

DOI

10.1137/110823699

Altmetrics

Bosilca, G., Ltaief, H., & Dongarra, J. (2012). Power profiling of Cholesky and QR factorizations on distributed memory systems. Computer Science - Research and Development, 29(2), 139–147. https://doi.org/10.1007/s00450-012-0224-2

Handle

10754/562284

DOI

10.1007/s00450-012-0224-2

Altmetrics

Haidar, A., Ltaief, H., Luszczek, P., & Dongarra, J. (2012). A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction. 2012 IEEE 26th International Parallel and Distributed Processing Symposium. https://doi.org/10.1109/ipdps.2012.13

Handle

10754/575805

DOI

10.1109/ipdps.2012.13

Altmetrics

Ltaief, H., Luszczek, P., & Dongarra, J. (2012). Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction. Lecture Notes in Computer Science, 661–670. https://doi.org/10.1007/978-3-642-31464-3_67

Handle

10754/575758

DOI

10.1007/978-3-642-31464-3_67

Altmetrics

Agullo, E., Augonnet, C., Dongarra, J., Faverge, M., Langou, J., Ltaief, H., & Tomov, S. (2011). LU factorization for accelerator-based systems. 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). https://doi.org/10.1109/aiccsa.2011.6126599

Handle

10754/575804

DOI

10.1109/aiccsa.2011.6126599

Altmetrics

Dongarra, J., Faverge, M., Ltaief, H., & Luszczek, P. (2011). High performance matrix inversion based on LU factorization for multicore architectures. Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers - MTAGS ’11. https://doi.org/10.1145/2132876.2132885

Handle

10754/575750

DOI

10.1145/2132876.2132885

Altmetrics

2011

Haidar, A., Ltaief, H., & Dongarra, J. (2011). Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’11. https://doi.org/10.1145/2063384.2063394

Handle

10754/575751

DOI

10.1145/2063384.2063394

Altmetrics

Ltaief, H., Luszczek, P., & Dongarra, J. (2011). Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency. Computer Science - Research and Development, 27(4), 277–287. https://doi.org/10.1007/s00450-011-0191-z

Handle

10754/575552

DOI

10.1007/s00450-011-0191-z

Altmetrics

Haidar, A., Ltaief, H., YarKhan, A., & Dongarra, J. (2011). Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures. Concurrency and Computation: Practice and Experience, 24(3), 305–321. https://doi.org/10.1002/cpe.1829

DOI

10.1002/cpe.1829