Computer Engineering and Applications ›› 2024, Vol. 60 ›› Issue (21): 286-296.DOI: 10.3778/j.issn.1002-8331.2307-0016

• Engineering and Applications • Previous Articles     Next Articles

Research on Performance Measurement and Analysis Tools for Parallel Programs Based on Sampling

HU Jiarui, SHI Jingyan, GUO Chaoqi   

  1. 1.Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China
    2.University of Chinese Academy of Sciences, Beijing 100049, China
  • Online:2024-11-01 Published:2024-10-25

基于采样的并行程序性能测量分析工具研究

胡家瑞,石京燕,郭超奇   

  1. 1.中国科学院 高能物理研究所,北京 100049
    2.中国科学院大学,北京 100049

Abstract: The performance of parallel computing programs often has a big gap between the theoretical peak and the expectation in practice. Using performance analysis tools for program tuning is an efficient way to solve this problem. However, programmers and developers often face challenges such as difficult selection, complex configuration and complex use when using performance analysis tools. The research of sampling-based parallel program performance analysis tools is helpful to solve the above problems. Performance tools based on asynchronous sampling can better control the measurement overhead and the size of the measurement data compared to the instrumentation technology. This paper focuses on three typical sample-based performance analysis tools: VTune Profiler, HPCToolkit and Nsight Systems and analyzes the principle and the function. The software and hardware analysis capabilities and parallel programming analysis capabilities of the tools are explored and compared in detail in combination with practical applications such as VASP. According to the different applicability and analysis effect of these tools in different application scenarios, a scheme of using a variety of tools for performance analysis is proposed, which provides a useful reference for developers and programmers.

Key words: performance analysis tools, asynchronous sampling, hardware performance counter, parallel program, program tuning

摘要: 在实际运行中,并行计算程序的性能常常在理论峰值与预期存在较大差距。使用性能分析工具进行程序调优是解决这一问题的高效手段。然而,程序员和开发者在使用性能分析工具时往往面临选择困难、配置和使用复杂等挑战。研究基于采样的并行程序性能分析工具有助于解决上述问题。相比于插桩技术,基于异步采样的性能工具可以更好地控制测量开销和测量数据大小。着重研究了三种典型的基于采样的性能分析工具:VTune Profiler、HPCToolkit和Nsight Systems,分析了其原理和功能,并且结合VASP等实际应用程序对工具的软硬件分析能力和并行编程分析能力进行了详细的探究和对比。根据这些工具在不同的应用场景下表现出的不同适用性和分析效果,提出了综合运用多种工具进行性能分析的方案,为开发者和程序员提供有益的参考。

关键词: 性能分析工具, 异步采样, 硬件性能计数器, 并行程序, 程序调优