Novel hpc techniques to batch execution of many variable size blas computations on gpus, Proceedings of the International Conference on Supercomputing, p.5, 2017. ,
A hybridization methodology for highperformance linear algebra software for gpus, GPU Computing Gems Jade Edition, pp.473-484, 2012. ,
Achieving high performance on supercomputers with a sequential task-based programming model, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01618526
Numerical linear algebra on emerging architectures: The plasma and magma projects, In Journal of Physics: Conference Series, vol.180, p.12037, 2009. ,
Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures, High Performance Computing ,
, , pp.22-40
Performance and scalability of the block low-rank multifrontal factorization on multicore architectures, ACM Trans. Math. Softw, vol.45, pp.1-2, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01505070
A fast block lowrank dense solver with applications to finite-element matrices, Journal of Computational Physics, vol.304, pp.170-188, 2016. ,
Starpu-mpi: Task programming over clusters of machines enhanced with accelerators, European MPI Users' Group Meeting, pp.298-299, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00725477
Accelerating a massively parallel numerical simulation in electromagnetism using a cluster of gpus, p.PPAM, 2013. ,
Approximation of boundary element matrices, Numerische Mathematik, vol.86, pp.565-589, 2000. ,
Hierarchical Matrices -A Means to Efficiently Solve Elliptic Boundary Value Problems, Lecture Notes in Computational Science and Engineering, vol.63, 2008. ,
, Guide. Society for Industrial and Applied Mathematics, 1997.
Cilk: An efficient multithreaded runtime system, Journal of parallel and distributed computing, vol.37, pp.55-69, 1996. ,
Flexible development of dense linear algebra algorithms on massively parallel architectures with dplasma, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1432-1441, 2011. ,
Exploiting heterogeneity to enhance scalability, Computing in Science & Engineering, vol.15, pp.36-45, 2013. ,
Batched QR and SVD algorithms on gpus with applications in hierarchical matrix compression, Parallel Computing, vol.74, pp.19-33, 2018. ,
hwloc: A generic framework for managing hardware affinities in hpc applications, 18th Euromicro Conference on Parallel, Distributed and Networkbased Processing, pp.180-186, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00429889
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput, vol.35, pp.38-53, 2009. ,
Exploiting nested taskparallelism in the h-lu factorization, Journal of Computational Science, 2019. ,
Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp.123-132, 2008. ,
Tile low-rank GEMM using batched operations on gpus, Euro-Par 2018: Parallel Processing -24th International Conference on Parallel and Distributed Computing, vol.11014, pp.811-825, 2018. ,
A sparse matrix arithmetic based on h-matrices. part i: Introduction to h-matrices, Computing, vol.62, issue.2, pp.89-108, 1999. ,
On H 2 -Matrices, pp.9-29, 2000. ,
Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM review, vol.53, pp.217-288, 2011. ,
A scalable h-matrix approach for the solution of boundary integral equations on multi-gpu clusters, 2018. ,
Distributed task-based runtime systemscurrent state and micro-benchmark performance, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp.934-941, 2018. ,
Parallel hierarchical matrices with adaptive cross approximation on symmetric multiprocessing clusters, Journal of information processing, vol.22, pp.642-650, 2014. ,
H-lu factorization on many-core systems, Comput. Vis. Sci, vol.16, issue.3, pp.105-117, 2013. ,
Implementing linear algebra routines on multi-core processors with pipelining and a look ahead, In International Workshop on Applied Parallel Computing, pp.147-156, 2006. ,
Fast direct solver for the boundary element method in electromagnetism and acoustics : H-Matrices. Parallelism and industrial applications, 2014. ,
High-performance matrix-matrix multiplications of very small matrices, European Conference on Parallel Processing, pp.659-671, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01409286
Intel® threading building blocks, J. Comput. Sci. Coll, vol.23, pp.298-298, 2008. ,
Sparse supernodal solver using block low-rank compression: Design, performance and analysis, Journal of Computational Science, vol.27, pp.255-270, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01660665
Solving dense linear systems on platforms with multiple hardware accelerators, ACM Sigplan Notices, vol.44, pp.121-130, 2009. ,
A distributedmemory package for dense hierarchically semi-separable matrix computations using randomization, ACM Trans. Math. Softw, vol.42, issue.4, p.35, 2016. ,
Integrating blocking and non-blocking mpi primitives with task-based programming models, Parallel Computing, 2018. ,
Argobots: A lightweight low-level threading and tasking framework, IEEE Transactions on Parallel and Distributed Systems, vol.29, issue.3, pp.512-526, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01887586
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, vol.19, p.11, 2009. ,
On Runtime Systems for Task-based Programming on Heterogeneous Platforms. Habilitationà diriger des recherches, 2018. ,
URL : https://hal.archives-ouvertes.fr/tel-01959127
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, pp.232-240, 2010. ,
Superfast multifrontal method for large structured linear systems of equations, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.1382-1411, 2010. ,
Fast algorithms for hierarchically semiseparable matrices, Numerical Linear Algebra with Applications, vol.17, pp.953-976, 2010. ,