L. Benini, E. Flamand, D. Fuin, and D. Melpignano, P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp.983-987
DOI : 10.1109/DATE.2012.6176639

. Tilera, 64-core processor, http://www.tilera.com/, [Online

. Kalray, Multi-purpose processor array

. Cavium, Octeon multi-core processor family, http://www.cavium.com/OCTEON MIPS64.html, [Online

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 programs, ACM SIGARCH Computer Architecture News, vol.23, issue.2, pp.24-36, 1995.
DOI : 10.1145/225830.223990

S. Iqbal, Y. Liang, and H. Grahn, ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems, IEEE Computer Architecture Letters, vol.9, issue.2, pp.45-48, 2010.
DOI : 10.1109/L-CA.2010.14

C. Bienia, S. Kumar, J. P. Singh, and K. Li, The PARSEC benchmark suite, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, PACT '08, pp.72-81, 2008.
DOI : 10.1145/1454115.1454128

T. R. Halfhill, Embc's multibench arrives, 2008.

A. Weiss, The standardization of embedded benchmarking: pitfalls and opportunities, in: Computer Design, ICCD '99) International Conference on, pp.492-508, 1999.

C. Lattner and V. Adve, Llvm: a compilation framework for lifelong program analysis transformation, in: Code Generation and Optimization, CGO 2004. International Symposium on, pp.75-86, 2004.

D. Juhsz, llvm backend for tilera processor, https://github.com/llvm-tilera, [Online

, NVIDIA Corporation and the Portland Group, The openacc application programming interface, pp.1-1, 2011.

M. Association, Multicore communications api specification v1, p.63, 2008.

J. Stone, D. Gohara, and G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, 2010.
DOI : 10.1109/MCSE.2010.69

D. R. Engler, M. F. Kaashoek, J. O-'toole, and J. , Exokernel, ACM SIGOPS Operating Systems Review, vol.29, issue.5, pp.251-266, 1995.
DOI : 10.1145/224057.224076

S. Boyd-wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek et al., Corey: An operating system for many cores, Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, pp.43-57, 2008.

B. Saha, A. Adl-tabatabai, A. Ghuloum, M. Rajagopalan, R. L. Hudson et al., Enabling scalability and performance in a large scale CMP environment, ACM SIGOPS Operating Systems Review, vol.41, issue.3, pp.73-86, 2007.
DOI : 10.1145/1272998.1273006

, Enea bare metal performance tools for netlogic xlp, ENEA, 2011.

A. Schmidt, Profiling bare-metal cores in amp systems, System, Software, SoC and Silicon Debug Conference (S4D), 2012, pp.1-4

F. Bellard, Qemu, a fast and portable dynamic translator, 2005.

F. Thabet, Y. Lhuillier, C. Andriamisaina, J. Philippe, and R. David, An efficient and flexible hardware support for accelerating synchronization operations on the sthorm many-core architecture, to appear in Design, Europe Conference Exhibition (DATE), p.2013, 2013.

M. Ojail, R. David, K. Chehida, Y. Lhuillier, and L. Benini, Synchronous reactive fine grain tasks management for homogeneous many-core architectures, 2011.

M. Ojail, R. David, Y. Lhuillier, and A. Guerre, Artm: A lightweight fork-join framework for many-core embedded systems, to appear in Design, Europe Conference Exhibition (DATE), p.2013, 2013.

P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, p.511, 2001.
DOI : 10.1109/CVPR.2001.990517

J. Levon and P. Elie, Oprofile, A system-wide profiler for Linux systems Homepage: http://oprofile.sourceforge.net [Online

A. Agarwal and C. Celio, Cache coherence strategies in a many-core processor, 2009.