, NUMA memory balancing in the Linux kernel, 2018.
Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis, Proceedings of the 7th International Workshop on Runtime and Operating Systems for, 2017. ,
Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the international conference on High Performance Computing, HiPC'06, pp.338-352, 2006. ,
The NAS parallel benchmarks, The International Journal of Supercomputing Applications, vol.5, pp.63-73, 1991. ,
TABARNAC: Visualizing and Resolving Memory Access Issues on NUMA Architectures, Proceedings of the 2Nd Workshop on Visual Performance Analysis (VPA '15, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01221146
hwloc: A generic framework for managing hardware affinities in HPC applications, Proceedings of the International Conference on Parallel, Distributed, and Network-Based Processing, p.10, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00429889
ForestGOMP: an efficient OpenMP environment for NUMA architectures, International Journal of Parallel Programming, vol.38, pp.418-439, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00496295
Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures, ACM Trans. Archit. Code Optim, vol.12, issue.2, 2015. ,
Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures, ACM Trans. Archit. Code Optim, vol.13, 2016. ,
Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'11, 2011. ,
Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00945758
Optimizing memory affinity with a hybrid compiler/OS approach, Proceedings of the Computing Frontiers Conference, pp.221-229, 2017. ,
Characterizing communication and page usage of parallel applications for thread and data mapping. Performance Evaluation, pp.18-36, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01146859
Affinity-Based Thread and Data Mapping in Shared Memory Systems, ACM Comput. Surv, vol.49, 2016. ,
kMAF: Automatic kernel-level management of thread and data affinity, 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), 2014. ,
Challenges of Memory Management on Modern NUMA Systems, Commun. ACM, vol.58, pp.59-66, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01242202
Dissecting On-Node Memory Access Performance: A Semantic Approach, Proceedings of the conference on Supercomputing, 2014. ,
Enabling high-performance memory migration for multithreaded applications on linux, Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'09, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00358172
MemProf: A Memory Profiler for NUMA Multicore Systems, Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00945731
Affinity-OnNext-Touch: An Extension to the Linux Kernel for NUMA Architectures, Parallel Processing and Applied Mathematics, pp.576-585, 2010. ,
Thread and Memory Placement on NUMA Systems: Asymmetry Matters, 2015 USENIX Annual Technical Conference (USENIX ATC 15). USENIX Association, pp.277-289, 2015. ,
A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures, Proceedings of the symposium on Principles and Practices of Parallel Programming, 2014. ,
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15), 2015. ,
PIN: building customized program analysis tools with dynamic instrumentation, Acm sigplan notices, vol.40, pp.190-200, 2005. ,
Hardware Profile-guided Automatic Page Placement for ccNUMA Systems, Proceedings of the symposium on Principles and Practices of Parallel Programming, p.6, 2006. ,
Feedbackdirected Page Placement for ccNUMA via Hardware-generated Memory Traces, J. Parallel Distrib. Comput, vol.70, pp.1204-1219, 2010. ,
Memphis: Finding and fixing NUMArelated performance problems on multi-core platforms, Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'10, 2010. ,
RTHMS: a tool for data placement on hybrid memory system, Proceedings of the International Symposium on Memory Management, ISMM'17, pp.82-91, 2017. ,
numap: A portable library for low-level memory profiling, Proceedings of the International Conference on Embedded Computer Systems: Architectures, MOdeling and Simulation, SAMOS'16, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01285522
Feedbackdirected Thread Scheduling with Memory Considerations, Proceedings of the International Symposium on High-Performance Parallel and Distributed Computing, HPDC'07, 2007. ,
A Tool to Detect Performance Problems of Multi-threaded Programs on NUMA Systems, IEEE Trustcom, 2016. ,