Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (9): 127-133.

Previous Articles     Next Articles

DTU-PU:Decision Tree for Uncertain data with PU-learning

ZHANG Xing1, ZHANG Yang1,2, LIU Mingjian1, WANG Yong3   

  1. 1.College of Information Engineering, Northwest A&F University, Xi’an 712100, China
    2.State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
    3.School of Computer, Northwestern Polytechnical University, Xi’ an 710072, China
  • Online:2013-05-01 Published:2016-03-28

DTU-PU:针对不确定数据PU学习的决策树

张  星1,张  阳1,2,刘明建1,王  勇3   

  1. 1.西北农林科技大学 信息工程学院,西安 712100
    2.南京大学 计算机软件新技术国家重点实验室,南京 210093
    3.西北工业大学 计算机学院,西安 710072

Abstract: In many real world applications, such as sensor network, market analysis and medical diagnosis, uncertain data with PU-learning scenarios are common in emerging applications. Based on the information gain  algorithm in POSC45 and considering the uncertain data interval and probability distribution proposed in UDT, this paper proposes a decision tree algorithm DTU-PU (Decision Tree for Uncertain data with PU-learning), which can handle  uncertain data with uncertain numerical attribute. Experimental results on UCI datasets demonstrate that the proposed algorithm has good classification accuracy and it is robust against data uncertainty.

Key words: Positive and Unlabeled(PU)-learning, uncertainty, decision tree

摘要: 不确定数据的PU学习在现实世界的许多应用中,如在传感器网络、市场分析和医学诊断等领域普遍存在,提出了针对不确定数据PU学习的决策树算法。基于POSC45中信息增益的计算方法,引入UDT中处理连续属性的不确定数据时用到的不确定数据区间及概率分布函数的概念,提出了一种能处理连续属性的不确定数据PU学习的决策树算法DTU-PU(Decision Tree for Uncertain data with PU-learning)。在UCI数据集上的实验表明,DTU-PU具有较好的分类准确率和健壮性。

关键词: 只有正例样本和未标注样本(PU)学习, 不确定, 决策树