计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (24): 40-67.DOI: 10.3778/j.issn.1002-8331.2501-0440

• 热点与综述 • 上一篇    下一篇

因果发现技术研究综述

胡志远,高锦涛+   

  1. 宁夏大学 信息工程学院,银川 750021
  • 出版日期:2025-12-15 发布日期:2025-12-15

Review of Research on Causal Discovery Techniques

HU Zhiyuan, GAO Jintao+   

  1. School of Information Engineering, Ningxia University, Yinchuan 750021, China
  • Online:2025-12-15 Published:2025-12-15

摘要: 因果发现技术在医学、生物学、经济学、社会科学等领域有着广泛应用,能够揭示变量之间的因果关系,进而提升预测结果的可解释性。因果发现是指从数据中识别变量之间因果关系的过程,包括发现变量之间是否存在因果关系,同时还涉及理解这种关系的方向和强度。随着大数据技术的发展,数据驱动的方法逐渐成为因果发现的重要手段,它通过从大量数据中自动提取潜在的因果信息,有助于应对传统方法的局限性。数据驱动的因果发现能够有效利用高维数据,突破因果分析中对数据质量和独立性假设的依赖。目前,因果发现面临如何有效利用高维数据、精确控制混杂变量以及处理变量间的复杂交互等挑战。传统因果发现方法基于条件独立性测试,严重依赖数据质量,并且面对高维数据表现不佳。机器学习技术大大推动了因果发现技术的发展,比如高效的数据处理与分析、不确定性估计与可信度分析等。综述了目前具有代表性的因果发现的进展:介绍了传统因果发现中常用的方法,探究其核心过程存在的问题;总结了当前统计学习领域中流行的因果发现方法,并详细阐述这些方法的核心思想,在同一基准数据集上对其性能和适用数据类型场景进行比较,主要目的是为数据科学与统计学习领域的科研人员提供更有价值的参考;对因果发现未来的研究方向进行了总结。

关键词: 因果发现, 因果推断, 因果关系, 机器学习

Abstract: Causal discovery techniques have broad applications in fields such as medicine, biology, economics, and social sciences, enabling the identification of causal relationships between variables, thereby enhancing the interpretability of prediction results. Causal discovery refers to the process of identifying causal relationships among variables from data, including determining whether a causal relationship exists between variables and understanding the direction and strength of such relationships. Currently, causal discovery faces challenges such as effectively leveraging high-dimensional data, precisely controlling confounding variables, and managing complex interactions among variables. Traditional causal discovery methods are based on conditional independence tests, heavily relying on data quality and performing poorly in high-dimensional contexts. Machine learning techniques have significantly advanced the development of causal discovery, including efficient data processing and analysis, uncertainty estimation, and credibility analysis. This review summarizes the current progress in causal discovery: it introduces representative methods in traditional causal discovery, exploring the issues in their core processes; subsequently, it summarizes popular causal discovery methods in the field of statistical learning, detailing their core ideas and comparing their performance and applicable data types and scenarios. The primary objective is to provide more valuable references for researchers in the fields of data science and statistical learning. Finally, future research directions for causal discovery are summarized.

Key words: causal discovery, causal inference, causal relationship, machine learning