计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (35): 148-150.DOI: 10.3778/j.issn.1002-8331.2008.35.045

• 数据库、信号与信息处理 • 上一篇    下一篇

基于区分矩阵的数据离散化算法

秦 川1,黄 欢1,施化吉1,李星毅1,2   

  1. 1.江苏大学 计算机学院,江苏 镇江212013
    2.北京交通大学 电子信息学院,北京 100044
  • 收稿日期:2007-12-21 修回日期:2008-03-03 出版日期:2008-12-11 发布日期:2008-12-11
  • 通讯作者: 秦 川

New method of data discretization based on rough set theory

QIN Chuan1,HUANG Huan1,SHI Hua-ji1,LI Xing-yi1,2   

  1. 1.College of Computer,Jiangsu University,Zhenjiang,Jiangsu 212013,China
    2.College of Electronics and Information,Beijing Jiaotong University,Beijing 100044,China
  • Received:2007-12-21 Revised:2008-03-03 Online:2008-12-11 Published:2008-12-11
  • Contact: QIN Chuan

摘要: 由于传统的粗糙理论只能对数据库中离散数据进行处理,而绝大多数现实的数据库既包含了离散数据,又包含了连续数据。针对这一问题,提出了一种基于候选断点区分矩阵的数据离散化算法。该方法以断点核为起点,以候选断点在区分矩阵中出现的频率作为启发信息,逐次选择最重要的断点加入到结果断点子集中,并由最终的断点集得离散化后的信息系统。最后通过实例分析表明,该算法具有较好的离散化效果。

关键词: 粗糙集, 离散化, 断点核, 候选断点, 启发式算法

Abstract: The traditional rough set theory can only deal with the discrete attributes in database.However,most real-life databases consist of not only discrete attributes but also continuous attributes.In order to overcome the problem,a new method of data discretization based on candidate cuts discernibility matrix is presented.The cuts core is the jumping-off point of this algorithm and cuts core frequency in the discernibility matrix is used as heuristic information in this algorithm.Then the most important cut is selected,that is added to the cuts gather every time.At last,the dispersible information system can be got from the cuts gather.Finally,the presented example validates this algorithm that has a good discretization effect.

Key words: rough set, discretization, cuts core, candidate cuts, heuristic algorithm