A. Abdelfattah, A. Haidar, S. Tomov, and J. Dongarra, Novel hpc techniques to batch execution of many variable size blas computations on gpus, Proceedings of the International Conference on Supercomputing, p.5, 2017.

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A hybridization methodology for highperformance linear algebra software for gpus, GPU Computing Gems Jade Edition, pp.473-484, 2012.

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Achieving high performance on supercomputers with a sequential task-based programming model, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01618526

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The plasma and magma projects, In Journal of Physics: Conference Series, vol.180, p.12037, 2009.

K. Akbudak, H. Ltaief, A. Mikhalev, and D. Keyes, Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures, High Performance Computing

R. Kunkel, P. Yokota, D. Balaji, and . Keyes, , pp.22-40

P. R. Amestoy, A. Buttari, J. L'excellent, M. , and T. , Performance and scalability of the block low-rank multifrontal factorization on multicore architectures, ACM Trans. Math. Softw, vol.45, pp.1-2, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01505070

A. Aminfar, S. Ambikasaran, and E. Darve, A fast block lowrank dense solver with applications to finite-element matrices, Journal of Computational Physics, vol.304, pp.170-188, 2016.

C. Augonnet, O. Aumage, N. Furmento, R. Namyst, and S. Thibault, Starpu-mpi: Task programming over clusters of machines enhanced with accelerators, European MPI Users' Group Meeting, pp.298-299, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00725477

C. Augonnet, D. Goudin, A. Pujols, and M. Sesques, Accelerating a massively parallel numerical simulation in electromagnetism using a cluster of gpus, p.PPAM, 2013.

M. Bebendorf, Approximation of boundary element matrices, Numerische Mathematik, vol.86, pp.565-589, 2000.

M. Bebendorf, Hierarchical Matrices -A Means to Efficiently Solve Elliptic Boundary Value Problems, Lecture Notes in Computational Science and Engineering, vol.63, 2008.

L. S. Blackford, J. Choi, A. Cleary, E. D'azevedo, J. Demmel et al., Guide. Society for Industrial and Applied Mathematics, 1997.

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: An efficient multithreaded runtime system, Journal of parallel and distributed computing, vol.37, pp.55-69, 1996.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Flexible development of dense linear algebra algorithms on massively parallel architectures with dplasma, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., Exploiting heterogeneity to enhance scalability, Computing in Science & Engineering, vol.15, pp.36-45, 2013.

W. H. Boukaram, G. Turkiyyah, H. Ltaief, and D. E. Keyes, Batched QR and SVD algorithms on gpus with applications in hierarchical matrix compression, Parallel Computing, vol.74, pp.19-33, 2018.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A generic framework for managing hardware affinities in hpc applications, 18th Euromicro Conference on Parallel, Distributed and Networkbased Processing, pp.180-186, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00429889

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput, vol.35, pp.38-53, 2009.

R. Carratalá-sáez, S. Christophersen, J. I. Aliaga, V. Beltran, S. Börm et al., Exploiting nested taskparallelism in the h-lu factorization, Journal of Computational Science, 2019.

E. Chan, F. G. Van-zee, P. Bientinesi, E. S. Quintana-orti, G. Quintana-orti et al., Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp.123-132, 2008.

A. Charara, D. E. Keyes, and H. Ltaief, Tile low-rank GEMM using batched operations on gpus, Euro-Par 2018: Parallel Processing -24th International Conference on Parallel and Distributed Computing, vol.11014, pp.811-825, 2018.

W. Hackbusch, A sparse matrix arithmetic based on h-matrices. part i: Introduction to h-matrices, Computing, vol.62, issue.2, pp.89-108, 1999.

W. Hackbusch, B. Khoromskij, A. Sauter, and S. , On H 2 -Matrices, pp.9-29, 2000.

N. Halko, P. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM review, vol.53, pp.217-288, 2011.

H. Harbrecht and P. Zaspel, A scalable h-matrix approach for the solution of boundary integral equations on multi-gpu clusters, 2018.

R. Hoque and P. Shamis, Distributed task-based runtime systemscurrent state and micro-benchmark performance, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp.934-941, 2018.

A. Ida, T. Iwashita, T. Mifune, and Y. Takahashi, Parallel hierarchical matrices with adaptive cross approximation on symmetric multiprocessing clusters, Journal of information processing, vol.22, pp.642-650, 2014.

R. Kriemann, H-lu factorization on many-core systems, Comput. Vis. Sci, vol.16, issue.3, pp.105-117, 2013.

J. Kurzak and J. Dongarra, Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, In International Workshop on Applied Parallel Computing, pp.147-156, 2006.

B. Lizé, Fast direct solver for the boundary element method in electromagnetism and acoustics : H-Matrices. Parallelism and industrial applications, 2014.

I. Masliah, A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin et al., High-performance matrix-matrix multiplications of very small matrices, European Conference on Parallel Processing, pp.659-671, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01409286

C. Pheatt, Intel® threading building blocks, J. Comput. Sci. Coll, vol.23, pp.298-298, 2008.

G. Pichon, E. Darve, M. Faverge, P. Ramet, R. et al., Sparse supernodal solver using block low-rank compression: Design, performance and analysis, Journal of Computational Science, vol.27, pp.255-270, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01660665

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM Sigplan Notices, vol.44, pp.121-130, 2009.

F. Rouet, X. S. Li, P. Ghysels, and A. Napov, A distributedmemory package for dense hierarchically semi-separable matrix computations using randomization, ACM Trans. Math. Softw, vol.42, issue.4, p.35, 2016.

K. Sala, X. Teruel, J. M. Perez, A. J. Peña, V. Beltran et al., Integrating blocking and non-blocking mpi primitives with task-based programming models, Parallel Computing, 2018.

S. Seo, A. Amer, P. Balaji, C. Bordage, G. Bosilca et al., Argobots: A lightweight low-level threading and tasking framework, IEEE Transactions on Parallel and Distributed Systems, vol.29, issue.3, pp.512-526, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01887586

F. Song, A. Yarkhan, and J. Dongarra, Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, vol.19, p.11, 2009.

S. Thibault, On Runtime Systems for Task-based Programming on Heterogeneous Platforms. Habilitationà diriger des recherches, 2018.
URL : https://hal.archives-ouvertes.fr/tel-01959127

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, pp.232-240, 2010.

J. Xia, S. Chandrasekaran, M. Gu, L. , and X. , Superfast multifrontal method for large structured linear systems of equations, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.1382-1411, 2010.

J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Fast algorithms for hierarchically semiseparable matrices, Numerical Linear Algebra with Applications, vol.17, pp.953-976, 2010.