计算机工程与应用 ›› 2024, Vol. 60 ›› Issue (8): 121-130.DOI: 10.3778/j.issn.1002-8331.2212-0094

• 模式识别与人工智能 • 上一篇    下一篇

融合Lasso的近似马尔科夫毯特征选择方法

刘明,杜建强,李郅琴,罗计根,聂斌,张梦婷   

  1. 1.江西中医药大学 计算机学院,南昌 330004
    2.江西师范大学 信息化办公室,南昌 330022
  • 出版日期:2024-04-15 发布日期:2024-04-15

Approximate Markov Blanket Feature Selection Method Based on Lasso Fusion

LIU Ming, DU Jianqiang, LI Zhiqin, LUO Jigen, NIE Bin, ZHANG Mengting   

  1. 1.School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
    2.Informatization Office, Jiangxi Normal University, Nanchang 330022, China
  • Online:2024-04-15 Published:2024-04-15

摘要: 在特征选择问题中,近似马尔科夫毯常用于判断冗余特征,但所得到的冗余特征并不完全相同,因此,在直接使用近似马尔科夫毯删除冗余特征时,存在可能导致信息丢失的情况,影响模型精度。为此,提出一种用于中药代谢组学高维小样本数据的融合Lasso的近似马尔科夫毯特征选择方法。方法分为两个阶段,第一阶段,通过最大信息系数对特征的相关度分析过滤无关特征;第二阶段,采用近似马尔科夫毯构建相似特征组,使用Lasso评估相似特征组中特征影响力,迭代去除冗余特征。通过实验对比表明,该算法可以在一定程度上减少有用信息丢失,去除无关特征和冗余特征,提高模型精度和稳定性。

关键词: 近似马尔科夫毯, Lasso, 特征选择, 高维小样本, 中医药信息

Abstract: In feature selection, approximate Markov blankets are often used to judge redundant features, but the redundant features obtained are not identical. Therefore, when using approximate Markov blankets directly to delete redundant features, there may be situations that may lead to information loss and affect model accuracy. Therefore, an approximate Markov blanket feature selection method based on Lasso fusion for high-dimensional small sample data of traditional Chinese medicine metabonomics is proposed. The method is divided into two stages. In the first stage, irrelevant features are filtered by analyzing the correlation of features with the maximum information coefficient. In the second stage, approximate Markov blankets are used to construct similar feature groups, Lasso is used to evaluate the influence of features in similar feature groups, and redundant features are removed iteratively. The experimental results show that the algorithm can reduce the loss of useful information, remove irrelevant features and redundant features, and improve the accuracy and stability of the model.

Key words: approximate Markov blanke, Lasso, feature selection, high dimensional small sample, traditional Chinese medicine (TCM) information