A combined fast/cycle accurate simulation tool for reconfigurable accelerator evaluation: application to distributed data management

Abstract : Parallel computing systems based on reconfigurable accelerators are becoming (1) increasingly heterogeneous, (2) difficult to design and (3) complex to model. Such modeling of a parallel computing system helps to evaluate its performance and to improve its architecture before prototyping. This paper presents a simulation tool aiming to study the integration of reconfigurable accelerators in scalable distributed systems and runtimes, such as S-DSM systems, where S-DSM (software-distributed shared memory) is a paradigm to ease data management among distributed nodes. This tool allows us to simulate the execution of irregular compute kernels accessing distributed data. To deal with the complexity of modeling (3) the complete system we used a hybrid methodology. We integrated the simulation engine into the S-DSM. The distributed data management part is executed on the physical architecture allowing to generate precise and faithful latencies, and the accelerator simulation is cycle accurate. We used general sparse matrix-matrix multiplication (SpGEMM) as a case study. We show that the use of this tool makes it possible to analyze the behavior of an heterogeneous system (1) with rapid prototyping and simulation. The analysis of the results allowed to determine the correct sizing of the architecture (2) to obtain the best performance. The tool allowed to identify the bottleneck of our architecture and confirmed the possibility of hiding data access latencies. Our simulation platform allows to emulate a heterogeneous distributed system by introducing a slowdown between 1.2 and 3.7 times compared to the compute kernel simulation alone.
