, NUMA memory balancing in the Linux kernel. https://www.kernel.org/doc/Documentation/sysctl/kernel.txt, Last accessed 24th, 2018.
Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis, Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017 , ROSS '17, 2017. ,
DOI : 10.1109/ISPASS.2009.4919635
Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the international conference on High Performance Computing, HiPC'06, pp.338-352, 2006. ,
DOI : 10.1007/11945918_35
The NAS parallel benchmarks, The International Journal of Supercomputing Applications, vol.5, issue.3, pp.63-73, 1991. ,
TABARNAC, Proceedings of the 2nd Workshop on Visual Performance Analysis, VPA '15, 2015. ,
DOI : 10.1109/VPA.2014.12
URL : https://hal.archives-ouvertes.fr/hal-01221146
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol.62, issue.5-6, pp.418-439, 2010. ,
DOI : 10.1007/s10766-010-0136-3
URL : https://hal.archives-ouvertes.fr/inria-00496295
Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures, ACM Transactions on Architecture and Code Optimization, vol.12, issue.2, 2015. ,
DOI : 10.1145/2486159.2486175
Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures, ACM Trans. Archit. Code Optim, vol.13, p.3, 2016. ,
Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'11, 2011. ,
Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00945758
Optimizing memory affinity with a hybrid compiler/OS approach, Proceedings of the Computing Frontiers Conference on ZZZ , CF'17, pp.221-229, 2017. ,
DOI : 10.1145/223982.223990
Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation, pp.88-89, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01146859
Affinity-Based Thread and Data Mapping in Shared Memory Systems, ACM Computing Surveys, vol.49, issue.4, 2016. ,
DOI : 10.1109/HOTI.2010.24
kMAF, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, 2014. ,
DOI : 10.1016/j.jpdc.2008.05.006
Challenges of memory management on modern NUMA systems, Communications of the ACM, vol.58, issue.12, pp.12-59, 2015. ,
DOI : 10.1145/2814328
URL : https://hal.archives-ouvertes.fr/hal-01242202
Dissecting On-Node Memory Access Performance: A Semantic Approach, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, p.14, 2014. ,
DOI : 10.1109/SC.2014.19
Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009. ,
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172
MemProf: A Memory Profiler for NUMA Multicore Systems, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00945731
Affinity-On-Next-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010. ,
DOI : 10.1007/978-3-642-14390-8_60
Thread and Memory Placement on NUMA Systems: Asymmetry Matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, pp.277-289, 2015. ,
A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the symposium on Principles and Practices of Parallel Programming, p.14, 2014. ,
DOI : 10.1145/2692916.2555271
ScaAnalyzer, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015. ,
DOI : 10.1145/1952682.1952688
PIN: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, pp.190-200, 2005. ,
Hardware profile-guided automatic page placement for ccNUMA systems, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, p.6, 2006. ,
DOI : 10.1145/1122971.1122987
Feedbackdirected Page Placement for ccNUMA via Hardware-generated Memory Traces, J. Parallel Distrib. Comput, vol.70, pp.12-1204, 2010. ,
DOI : 10.1016/j.jpdc.2010.08.015
URL : http://moss.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/TR-2009-9.pdf
Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), 2010. ,
DOI : 10.1109/ISPASS.2010.5452060
RTHMS: a tool for data placement on hybrid memory system, Proceedings of the International Symposium on Memory Management, ISMM'17, pp.82-91, 2017. ,
DOI : 10.1145/3156685.3092273
numap: A portable library for low-level memory profiling, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS), p.16, 2016. ,
DOI : 10.1109/SAMOS.2016.7818331
URL : https://hal.archives-ouvertes.fr/hal-01285522
Feedbackdirected Thread Scheduling with Memory Considerations, Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing, p.7, 2007. ,
DOI : 10.1145/1272366.1272380
URL : http://www.cs.utk.edu/~shirley/papers/hpdc07.pdf
A Tool to Detect Performance Problems of Multi-threaded Programs on NUMA Systems, 2016 IEEE Trustcom/BigDataSE/ISPA, 2016. ,
DOI : 10.1109/TrustCom.2016.0187