[1] LIU L, JIN Y, YI L, et al. A design of autonomous error-tolerant architectures for massively parallel computing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018, 26(10): 2143-2154.
[2] XU Y, ZHAO Z, WU W, et al. RPPA: a remote parallel program performance analysis tool[J]. Journal of Software, 2011, 6(12): 2399-2406.
[3] 赵景元. 基于LLVM的并行程序自动性能测量的研究[D]. 上海: 华东师范大学, 2022.
ZHAO J Y. An auto performance profiling for parallel programs on LLVM[D]. Shanghai: East China Normal University, 2022.
[4] GRAHAM S L, KESSLER P B, MCKUSICK M K. Gprof: a call graph execution profiler[J]. ACM Sigplan Notices, 1982, 17(6): 120-126.
[5] SHENDE S S, MALONY A D. The TAU parallel performance system[J]. The International Journal of High Performance Computing Applications, 2006, 20(2): 287-311.
[6] MILLER B P, CALLAGHAN M D, CARGILLE J M, et al. The Paradyn parallel performance measurement tool[J]. Computer, 1995, 28(11): 37-46.
[7] ADHIANTO L, BANERJEE S, FAGAN M, et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs[J]. Concurrency & Computation Practice & Experience, 2010, 22(6): 685-701.
[8] NVIDIA Corporation. NVIDIA nsight systems[EB/OL]. [2022-12-19]. https://developer.NVIDIA.com/nsight-systems.
[9] GASTER B R, HOWES L, KAELI D R, et al. OpenCL profiling and debugging[J]. Heterogeneous Computing with OpenCL, 2013: 243-261.
[10] VETTER J S, MCCRACKEN M O. Statistical scalability analysis of communication operations in distributed applications[J]. ACM SIGPLAN Notices, 2002, 36(7): 123-132.
[11] NAGEL W E, ARNOLD A, WEBER M, et al. VAMPIR: visualization and analysis of MPI resources[J]. Supercomputer, 1996, 12(1): 69-80.
[12] Intel Corporation. Intel VTune profiler[EB/OL]. [2022-12-27]. https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html.
[13] MAROWKA A. On performance analysis of a multithreaded application parallelized by different programming models using Intel Vtune[C]//Proceedings of the 11th International Conference on Parallel Computing Technologies, Kazan, 2011: 317-331.
[14] Intel Corporation. Intel VTune profiler user guide[EB/OL]. [2022-12-27]. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/overview.html.
[15] ZHOU K, KRENTEL M W, MELLOR-CRUMMEY J. Tools for top-down performance analysis of GPU-accelerated applications[C]//Proceedings of the 34th ACM International Conference on Supercomputing, Barcelona, 2020: 1-12.
[16] MALONY A D, BIERSDORFF S, SHENDE S, et al. Parallel performance measurement of heterogeneous parallel systems with GPUs[C]//Proceedings of the 2011 International Conference on Parallel Processing, Taipei, China, 2011: 176-185.
[17] WELTON B, MILLER B P. Diogenes: looking for an honest CPU/GPU performance measurement tool[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, 2019: 1-20.
[18] 潘晓东, 孙晓乐, 郑文旭, 等. 并行程序性能和功耗的测试分析工具研究[J]. 计算机技术与发展, 2021, 31(7): 69-74.
PAN X D, SUN X L, ZHENG W X, et al. A survey of performance and power measurement and analysis tools for parallel programs[J]. Computer Technology and Development, 2021, 31(7): 69-74.
[19] Rice University. HPCToolkit user’s manual[EB/OL]. [2022-12-19]. http://www.hpctoolkit.org/manual/HPCToolkit-users-manual.pdf.
[20] 张宇峰. 利用Itanium2的PMU部件开发程序性能分析工具[J]. 计算机技术与发展, 2006, 16(8): 69-71.
ZHANG Y F. Developing performance analysis tool using Itanium2 PMU[J]. Computer Technology and Development, 2006, 16(8): 69-71.
[21] COARFA C, MELLOR-CRUMMEY J M, FROYD N, et al. Scalability analysis of SPMD codes using expectations[C]//Proceedings of the 21st Annual International Conference on Supercomputing, Seattle, 2007: 13-22.
[22] 徐恒阳. 龙芯多核平台上性能分析工具的设计与实现[D]. 合肥: 中国科学技术大学, 2011.
XU H Y. Design and implementation of performance analysis tool on loongson 3A[D]. Hefei: University of Science and Technology of China, 2011.
[23] SHOJANIA H. Hardware-based performance monitoring with VTune performance analyzer under Linux[EB/OL]. [2022?12?29]. https://hassan.shojania.com/pdf/VTuneProjectReport.pdf.
[24] ZHOU K, ADHIANTO L, ANDERSON J, et al. Measurement and analysis of GPU-accelerated applications with HPCToolkit[J]. Parallel Computing, 2021, 108:102837.
[25] FROYD N, MELLOR-CRUMMEY J, FOWLER R. Low-overhead call path profiling of unmodified, optimized code[C]//Proceedings of the 19th Annual International Conference on Supercomputing, Cambridge, 2005: 81-90.
[26] NVIDIA Corporation. Nsight systems user guide[EB/OL]. [2022?12?07]. https://docs.NVIDIA.com/nsight-systems/UserGuide/index.html.
[27] KRESSE G, FURTHMULLER J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set[J]. Computational Materials Science, 1996, 6(1): 15-50.
[28] MANUEL P. Dijkstra[EB/OL]. [2023-01-15]. https://github.com/mapa17/Dijkstra.
[29] JIAN D, YVES R, PEIMIN Z, et al. 3D time-domain electromagnetic full waveform inversion in Debye dispersive medium accelerated by multi-GPU paralleling[J]. Computer Physics Communications, 2021, 265(1):108002.
[30] Intel Corporation. Intel trace analyzer and collector[EB/OL]. [2023-01-04]. https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/trace-analyzer.html.
[31] NVIDIA Corporation. NVIDIA CUDA profiling tools interface[EB/OL]. [2023-02-07]. https://developer.NVIDIA.com/cupti-ctk11_6.
[32] NVIDIA Corporation. NVIDIA nsight compute[EB/OL]. [2022-12-19]. https://developer.NVIDIA.com/nsight-compute.
[33] NVIDIA Corporation. The NVIDIA tools extension library[EB/OL]. [2023-02-02]. https://docs.NVIDIA.com/nsight-visual-studio-edition/nvtx/index.html. |