, NUMA memory balancing in the Linux kernel. https://www.kernel.org/doc/Documentation/sysctl/kernel.txt, Last accessed 24th, 2018.

S. Akiyama and T. Hirofuchi, Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis, Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 , ROSS '17, 2017.
DOI : 10.1109/ISPASS.2009.4919635

J. Antony, P. Pete, and . Janes, Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the international conference on High Performance Computing, HiPC'06, pp.338-352, 2006.
DOI : 10.1007/11945918_35

H. David, E. Bailey, . Barszcz, T. John, . Barton et al., The NAS parallel benchmarks, The International Journal of Supercomputing Applications, vol.5, issue.3, pp.63-73, 1991.

D. Beniamine, M. Diener, G. Huard, and P. O. Navaux, TABARNAC, Proceedings of the 2nd Workshop on Visual Performance Analysis, VPA '15, 2015.
DOI : 10.1109/VPA.2014.12
URL : https://hal.archives-ouvertes.fr/hal-01221146

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010.
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010.
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295

Q. Chen and M. Guo, Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures, ACM Transactions on Architecture and Code Optimization, vol.12, issue.2, 2015.
DOI : 10.1145/2486159.2486175

H. M. Eduardo, M. Cruz, L. L. Diener, P. O. Pilla, and . Navaux, Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures, ACM Trans. Archit. Code Optim, vol.13, p.3, 2016.

H. Eduardo, M. A. Molina-da-cruz, A. Zanata-alves, P. O. Carissimi, C. P. Navaux et al., Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'11, 2011.

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00945758

M. Diener, H. Eduardo, . Cruz, A. Marco, E. Alves et al., Optimizing memory affinity with a hybrid compiler/OS approach, Proceedings of the Computing Frontiers Conference on ZZZ , CF'17, pp.221-229, 2017.
DOI : 10.1145/223982.223990

M. Diener, E. H. Cruz, L. L. Pilla, F. Dupros, and P. O. Navaux, Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation, pp.88-89, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01146859

M. Diener, E. H. Cruz, M. A. Alves, P. O. Navaux, and I. Koren, Affinity-Based Thread and Data Mapping in Shared Memory Systems, ACM Computing Surveys, vol.49, issue.4, 2016.
DOI : 10.1109/HOTI.2010.24

M. Diener, E. H. Cruz, P. O. Navaux, A. Busse, and H. Heiß, kMAF, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, 2014.
DOI : 10.1016/j.jpdc.2008.05.006

F. Gaud, B. Lepers, J. Funston, M. Dashti, A. Fedorova et al., Challenges of memory management on modern NUMA systems, Communications of the ACM, vol.58, issue.12, pp.12-59, 2015.
DOI : 10.1145/2814328
URL : https://hal.archives-ouvertes.fr/hal-01242202

A. Giménez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusufi et al., Dissecting On-Node Memory Access Performance: A Semantic Approach, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, p.14, 2014.
DOI : 10.1109/SC.2014.19

B. Goglin and N. Furmento, Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009.
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172

R. Lachaize, B. Lepers, and V. Quéma, MemProf: A Memory Profiler for NUMA Multicore Systems, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00945731

S. Lankes, B. Bierbaum, and T. Bemmerl, Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010.
DOI : 10.1007/978-3-642-14390-8_60

V. Baptiste-lepers, A. Quema, and . Fedorova, Thread and Memory Placement on NUMA Systems: Asymmetry Matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, pp.277-289, 2015.

X. Liu and J. Mellor-crummey, A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the symposium on Principles and Practices of Parallel Programming, p.14, 2014.
DOI : 10.1145/2692916.2555271

X. Liu and B. Wu, ScaAnalyzer, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015.
DOI : 10.1145/1952682.1952688

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., PIN: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, pp.190-200, 2005.

J. Marathe and F. Mueller, Hardware profile-guided automatic page placement for ccNUMA systems, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, p.6, 2006.
DOI : 10.1145/1122971.1122987

J. Marathe, V. Thakkar, and F. Mueller, Feedbackdirected Page Placement for ccNUMA via Hardware-generated Memory Traces, J. Parallel Distrib. Comput, vol.70, pp.12-1204, 2010.
DOI : 10.1016/j.jpdc.2010.08.015
URL : http://moss.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/TR-2009-9.pdf

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010.
DOI : 10.1109/ISPASS.2010.5452060

I. B. Peng, R. Gioiosa, G. Kestor, P. Cicotti, E. Laure et al., RTHMS: a tool for data placement on hybrid memory system, Proceedings of the International Symposium on Memory Management, ISMM'17, pp.82-91, 2017.
DOI : 10.1145/3156685.3092273

M. Selva, L. Morel, and K. Marquet, numap: A portable library for low-level memory profiling, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), p.16, 2016.
DOI : 10.1109/SAMOS.2016.7818331
URL : https://hal.archives-ouvertes.fr/hal-01285522

F. Song, S. Moore, and J. Dongarra, Feedbackdirected Thread Scheduling with Memory Considerations, Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing, p.7, 2007.
DOI : 10.1145/1272366.1272380
URL : http://www.cs.utk.edu/~shirley/papers/hpdc07.pdf

L. Zhu, H. Jin, and X. Liao, A Tool to Detect Performance Problems of Multi-threaded Programs on NUMA Systems, 2016 IEEE Trustcom/BigDataSE/ISPA, 2016.
DOI : 10.1109/TrustCom.2016.0187