H. Ahmed, A. Skjellumh, P. Bangalore, and P. Pirkelbauer, Transforming blocking MPI collectives to Non-blocking and persistent operations, Proceedings of the 24th European MPI Users' Group Meeting on - EuroMPI '17, pp.1-11, 2017.

M. J. Clement and M. J. Quinn, Overlapping Computations, Communications and I/O in Parallel Sorting, Journal of Parallel and Distributed Computing, vol.28, issue.2, pp.162-172, 1995.

A. Danalis, L. Pollock, and M. Swany, Automatic MPI application transformation with ASPhALT, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.

A. Danalis, L. Pollock, M. Swany, and J. Cavazos, MPI-aware compiler optimizations for improving communication-computation overlap, Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09, pp.316-325, 2009.

D. Das, M. Gupta, R. Ravindran, W. Shivani, P. Sivakeshava et al., Compiler-controlled extraction of computation-communication overlap in MPI applications, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-8, 2008.

A. Denis and F. Trahay, MPI Overlap: Benchmark and Analysis, 2016 45th International Conference on Parallel Processing (ICPP), pp.258-267, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01324179

J. Guo, Q. Yi, J. Meng, J. Zhang, and P. Balaji, Compiler-Assisted Overlapping of Communication and Computation in MPI Applications, 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp.60-69, 2016.

M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards et al., Improving Performance via Mini-Applications, 2009.

T. Hoefler, P. Gottschling, W. Rehm, and A. Lumsdaine, Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.374-382, 2006.

T. Hoefler and A. Lumsdaine, Design, Implementation, and Usage of LibNBC, 2006.

K. Kandalla, A. Buluc, H. Subramoni, K. Tomko, J. Vienne et al., Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?, 2012 IEEE International Conference on Cluster Computing Workshops, pp.222-230, 2012.

C. Lattner and V. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, International Symposium on Code Generation and Optimization, 2004. CGO 2004., pp.75-86

D. Quinlan, ROSE: COMPILER SUPPORT FOR OBJECT-ORIENTED FRAMEWORKS, Parallel Processing Letters, vol.10, issue.02n03, pp.215-226, 2000.

S. Song and J. K. Hollingsworth, Computation?communication overlap and parameter auto-tuning for scalable parallel 3-D FFT, Journal of Computational Science, vol.14, pp.38-50, 2016.

M. Weiser, Program Slicing, Proceedings of the 5th International Conference on Software Engineering, pp.439-449, 1981.