Computer Engineering and Applications ›› 2013, Vol. 49 ›› Issue (1): 167-170.

Previous Articles     Next Articles

Value domain partition method of multiple attributes based on MDL principle

CHEN Aiping1, FAN Yuanyuan2   

  1. 1.School of Information Technology, Jinling Institute of Technology, Nanjing 211169, China
    2.Department of Computer and Information Engineering, Jiaozuo Teachers College, Jiaozuo, Henan 454000, China
  • Online:2013-01-01 Published:2013-01-16

MDL理论的多属性值域划分方法

陈爱萍1,范媛媛2   

  1. 1.金陵科技学院 信息技术学院,南京 211169
    2.焦作师范高等专科学校 计算机与信息工程系,河南 焦作 454000

Abstract: Value domain partition methods of continuous attributes are important research in data mining and machine learning. Many discretization methods are proposed, and most tend to discuss the discretization of 1-dimension attribute without considering the relationship among the attributes, which is difficult to get optimal discretization results. This paper proposes a value domain partition method of multiple attributes based on MDL principle. It derives a measurement function of multiple attributes partition by defining model selection of multiple attributes. The paper also designs a reasonable algorithm to find the best discretization result. Performance evaluation and analysis demonstrate that the proposed approach improves the classification and learning ability of Naive Bayes classifier.

Key words: data mining, discretization, Minimum Description Length Principle(MDLP), Naive Bayes

摘要: 连续属性值域划分方法是数据挖掘和机器学习领域的重要课题。但已有的大量离散化方法倾向于研究一维属性离散化问题,没有考虑多属性之间的相互关系,难于获得最佳的离散化结果。提出一种基于最小描述长度理论的多属性划分方法,通过定义多属性的模型选择问题,推导出多属性划分衡量函数;设计一种合理的算法来寻找最好的离散化结果。性能评价与分析表明,该方法在Naive贝叶斯分类器上有很好的分类学习能力。

关键词: 数据挖掘, 离散化, 最小描述长度理论, Naive贝叶斯