, NUMA memory balancing in the Linux kernel, 2018.

S. Akiyama and T. Hirofuchi, Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis, Proceedings of the 7th International Workshop on Runtime and Operating Systems for, 2017.

J. Antony and P. P. Janes, Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the international conference on High Performance Computing, HiPC'06, pp.338-352, 2006.

H. David, E. Bailey, . Barszcz, T. John, D. S. Barton et al., The NAS parallel benchmarks, The International Journal of Supercomputing Applications, vol.5, pp.63-73, 1991.

D. Beniamine, M. Diener, G. Huard, and P. O. Navaux, TABARNAC: Visualizing and Resolving Memory Access Issues on NUMA Architectures, Proceedings of the 2Nd Workshop on Visual Performance Analysis (VPA '15, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01221146

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A generic framework for managing hardware affinities in HPC applications, Proceedings of the International Conference on Parallel, Distributed, and Network-Based Processing, p.10, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00429889

F. Broquedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, ForestGOMP: an efficient OpenMP environment for NUMA architectures, International Journal of Parallel Programming, vol.38, pp.418-439, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00496295

Q. Chen and M. Guo, Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures, ACM Trans. Archit. Code Optim, vol.12, issue.2, 2015.

H. M. Eduardo, M. Cruz, L. L. Diener, P. O. Pilla, and . Navaux, Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures, ACM Trans. Archit. Code Optim, vol.13, 2016.

H. Eduardo, M. A. Molina-da-cruz, A. Zanata-alves, P. O. Carissimi, C. P. Navaux et al., Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'11, 2011.

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00945758

M. Diener, H. M. Eduardo, . Cruz, A. Z. Marco, E. Alves et al., Optimizing memory affinity with a hybrid compiler/OS approach, Proceedings of the Computing Frontiers Conference, pp.221-229, 2017.

M. Diener, H. M. Eduardo, L. L. Cruz, F. Pilla, P. O. Dupros et al., Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation, pp.18-36, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01146859

M. Diener, H. M. Eduardo, . Cruz, A. Z. Marco, P. O. Alves et al., Affinity-Based Thread and Data Mapping in Shared Memory Systems, ACM Comput. Surv, vol.49, 2016.

M. Diener, H. M. Eduardo, P. O. Cruz, A. Navaux, H. Busse et al., kMAF: Automatic kernel-level management of thread and data affinity, 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), 2014.

F. Gaud, B. Lepers, J. Funston, M. Dashti, A. Fedorova et al., Challenges of Memory Management on Modern NUMA Systems, Commun. ACM, vol.58, pp.59-66, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01242202

A. Giménez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusufi et al., Dissecting On-Node Memory Access Performance: A Semantic Approach, Proceedings of the conference on Supercomputing, 2014.

B. Goglin and N. Furmento, Enabling high-performance memory migration for multithreaded applications on linux, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'09, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00358172

R. Lachaize, B. Lepers, and V. Quéma, MemProf: A Memory Profiler for NUMA Multicore Systems, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00945731

S. Lankes, B. Bierbaum, and T. Bemmerl, Affinity-OnNext-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010.

V. Baptiste-lepers, A. Quema, and . Fedorova, Thread and Memory Placement on NUMA Systems: Asymmetry Matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, pp.277-289, 2015.

X. Liu and J. Mellor-crummey, A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the symposium on Principles and Practices of Parallel Programming, 2014.

X. Liu and B. Wu, ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15), 2015.

C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser et al., PIN: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, vol.40, pp.190-200, 2005.

J. Marathe and F. Mueller, Hardware Profile-guided Automatic Page Placement for ccNUMA Systems, Proceedings of the symposium on Principles and Practices of Parallel Programming, p.6, 2006.

J. Marathe, V. Thakkar, and F. Mueller, Feedbackdirected Page Placement for ccNUMA via Hardware-generated Memory Traces, J. Parallel Distrib. Comput, vol.70, pp.1204-1219, 2010.

C. Mccurdy and J. Vetter, Memphis: Finding and fixing NUMArelated performance problems on multi-core platforms, Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'10, 2010.

B. Ivy, R. Peng, G. Gioiosa, P. Kestor, E. Cicotti et al., RTHMS: a tool for data placement on hybrid memory system, Proceedings of the International Symposium on Memory Management, ISMM'17, pp.82-91, 2017.

M. Selva, L. Morel, and K. Marquet, numap: A portable library for low-level memory profiling, Proceedings of the International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulation, SAMOS'16, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01285522

F. Song, S. Moore, and J. Dongarra, Feedbackdirected Thread Scheduling with Memory Considerations, Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing, HPDC'07, 2007.

L. Zhu, H. Jin, and X. Liao, A Tool to Detect Performance Problems of Multi-threaded Programs on NUMA Systems, IEEE Trustcom, 2016.