计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (36): 142-145.

• 数据库、信号与信息处理 • 上一篇    下一篇

组合与概率的连续特征权衡量化方法

田海梅1,王  莹2   

  1. 1.金陵科技学院 信息技术学院,南京 211169
    2.北京电子科技职业学院 电信工程学院,北京 100016
  • 出版日期:2012-12-21 发布日期:2012-12-21

Trade-off quantization method for continuous features based on combination and probability

TIAN Haimei1, WANG Ying2   

  1. 1.School of Information Technology, Jinling Institute of Technology, Nanjing 211169, China
    2.School of Telecommunications Engineering, Beijing Vocational College of Electronic Science, Beijing 100016, China
  • Online:2012-12-21 Published:2012-12-21

摘要: 连续特征量化方法是数据挖掘方法中必要的预处理过程。呈现一种组合与概率的连续特征权衡量化方法。基于最小描述长度以及组合与概率理论,提出连续特征量化的权衡标准,能够在量化所导致的分类错误与量化区间信息之间得到合理的权衡;基于该权衡标准提出一种有效的动态规划量化算法,以找到最好的量化结果;量化后的数据采用naive贝叶斯分类器进行分类预测,与其他连续特征量化方法的对比实验结果表明,新方法得到了较高的平均学习精度。

关键词: 量化, 最小描述长度, 组合与概率, 权衡标准, 动态规划

Abstract: Quantization methods of continuous features are a necessary preprocess of data mining methods. This paper presents a trade-off discrimination method for continuous features based on minimum description length, combination and probability theories. It proposes a quantizative trade-off criterion for continuous features which reasonably balances classification errors and interval information generated by quantization. It proposes an effective dynamic programming quantization algorithm with the aim to find the best quantization result based on the trade-off criteria. The quantized data will be sent to naive bayes classifier to establish classification and prediction model. Contrastive experimental results show that the new method achieves higher mean learning accuracy than other quantization methods.

Key words: quantization, Minimum Description Length(MDL), combination and probability, trade-off criteria, Dynamic Programming(DP)