计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (21): 99-101.DOI: 10.3778/j.issn.1002-8331.2008.21.027

• 机器学习 • 上一篇    下一篇

基于关系数据分析的决策森林学习方法

王利民,李雄飞   

  1. 吉林大学 计算机科学与技术学院,长春 130012
  • 收稿日期:2008-04-30 修回日期:2008-06-13 出版日期:2008-07-21 发布日期:2008-07-21
  • 通讯作者: 王利民

Decision forest learning algorithm based on relational data analysis

WANG Li-min,LI Xiong-fei   

  1. College of Computer Science and Technology,Jilin University,Changchun 130012,China
  • Received:2008-04-30 Revised:2008-06-13 Online:2008-07-21 Published:2008-07-21
  • Contact: WANG Li-min

摘要: 模式识别中的多分类器集成日益得到研究人员的关注并成为研究的热点。提出一种基于决策森林构造的多重子模型集成方法,通过对每个样本赋予决策规则,构造决策森林而非单个决策树以自动确定相对独立的样本子集,在此基础上结合条件独立性假设进行模型集成。整个学习过程不需要任何人为参与,能够自适应确定决策树数量和每个子树结构,发挥各分类器在不同样本和不同区域上的分类优势。在UCI机器学习数据集上的实验结果和样例分析验证了方法的有效性。

关键词: 模式识别, 多重子模型集成, 决策森林, 条件独立性假设

Abstract: Multiple classifier integration in pattern recognition has received more attention and becomes one research hot.This paper proposes a multiple submodel integration algorithm based on decision forest construction.By giving distinct classification rule to each sample,decision forest rather than decision tree is constructed to automatically determine relatively independent attribute subset,and based on this we integrate submodel by applying conditional independence assumption.The whole learning procedure do not need any human interference.The independent structure of each subtree and the number of decision trees can be determined,which can help different classifiers to play advantage on different samples and domains.Theory analysis and experimental study on UCI data sets prove its feasibility and effectiveness.

Key words: pattern recognition, multiple submodel integration, decision forest, conditional independence assumption