Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (36): 181-184.

• 数据库与信息处理 • Previous Articles     Next Articles

Learning TAN from incomplete data

WANG Jian-lin1,WANG Zhi-hai2,WANG Xue-ling1

  

  1. 1.Department of Computer Science,Binzhou University,Binzhou,Shandong 256600,China
    2.School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-21 Published:2007-12-21
  • Contact: WANG Jian-lin

基于不完全数据的TAN学习算法

王建林1,王志海2,王学玲1   

  1. 1.滨州学院 计算机科学与技术系,山东 滨州 256600
    2.北京交通大学 计算机与信息技术学院,北京 100044
  • 通讯作者: 王建林

Abstract: TAN is a good trade-off between the model complexity and learn ability in practice,which has been widely used in data mining,machine learning and pattern recognition etc.Since there are few complete datasets in real-world,the paper develops research on how to efficiently learn TAN from incomplete data.Firstly an efficient method that could estimate conditional Mutual Information directly from incomplete data is presented.And then the basic TAN learning algorithm is extended to incomplete data using the conditional Mutual Information estimation method.Finally,experiments are carried out to evaluate the extended TAN and compare it with basic TAN.The experimental results show that the accuracy of the extended TAN is much higher than that of basic TAN on most of the incomplete datasets.Despite more time consumption of the extended TAN compared with basic TAN,it is still acceptable.The conditional Mutual Information estimation method can be easily combined with other techniques to improve TAN further.

Key words: TAN, learning, incomplete data, conditional mutual information

摘要: TAN算法是一种针对复杂数据且在实际中具有极强的学习能力的有效算法,它已被广泛应用于数据挖掘、机器学习和模式识别领域。由于现实世界中的数据大多是不完全数据,研究了怎样使TAN有效地从不完全数据中学习。首先,用一种有效的方法直接从不完全数据中估计条件互信息,然后应用估计条件互信息法去扩展基本的TAN算法来处理不相关数据,最后实验比较了扩展的TAN算法和基本的TAN算法。实验结果表明,在大多数不完全数据集合上扩展的TAN算法精确性明显高于基本的TAN算法。虽然扩展的TAN算法时间复杂度高于基本的TAN算法,但在可接受范围之内。此估计条件互信息的方法能够容易地和其它技术相结合来进一步提高TAN算法的性能。

关键词: TAN, 学习, 不完全数据, 条件互信息