计算机工程与应用 ›› 2009, Vol. 45 ›› Issue (7): 159-163.DOI: 10.3778/j.issn.1002-8331.2009.07.048

• 数据库、信号与信息处理 • 上一篇    下一篇

信息增益区分频繁模式分类方法

陶剑文1,2,赵杰煜2,姚奇富1   

  1. 1.浙江工商职业技术学院 信息工程系,浙江 宁波 315012
    2.宁波大学 信息科学与工程学院,浙江 宁波 315211
  • 收稿日期:2008-01-14 修回日期:2008-04-14 出版日期:2009-03-01 发布日期:2009-03-01
  • 通讯作者: 陶剑文

Frequent pattern classification method based on information gain

TAO Jian-wen1,2,ZHAO Jie-yu2,YAO Qi-fu1   

  1. 1.Department of Information Engineer,Zhejiang Business Technology Institute,Ningbo,Zhejiang 315012,China
    2.College of Information Science and Engineering,Ningbo University,Ningbo,Zhejiang 315211,China
  • Received:2008-01-14 Revised:2008-04-14 Online:2009-03-01 Published:2009-03-01
  • Contact: TAO Jian-wen

摘要: 基于频繁模式的分类应用研究尚处于初始阶段,但其在关系数据、文本文档与图等方面的分类应用已取得初步成果。系统地研究了基于信息增益区分的频繁模式分类问题,提出了一种基于信息增益区分的频繁模式分类模型(IGFPC),从理论上论证了该模型的可行性。通过建立模式频率与基于信息增益区分度量间的联系,提出了一种在挖掘有用频繁模式上设置最小支持度阀值的方法,基于该方法和提出的特征选择算法(IGPS),生成用以构建高质量模式分类器的区分频繁模式。实验研究显示基于信息增益区分的频繁模式分类框架模型能在分类大数据集上达到较好的扩展性能和较高的分类精度。

Abstract: The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data,text documents and graphs.This paper,conducts a systematic exploration of information gain based frequent pattern classification,and provides solid reasons supporting this methodology.By building a connection between pattern frequency and discriminative measures such as information gain,and also develops a strategy to set minimum support in frequent pattern mining for generating useful patterns.Based on this strategy,coupled with a proposed feature selection algorithm,discriminative frequent patterns can be generated for building high quality classifiers.The paper demonstrates that the information gain based frequent pattern classification framework can achieve good scalability and high accuracy in classifying large datasets.