Computer Engineering and Applications ›› 2012, Vol. 48 ›› Issue (9): 134-137.

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Discretization algorithm applied to interval numbers

MU Haijun1, E Xu1,2,3, JIN Chengmei1, WANG Quantie3   

  1. 1.School of Electronic and Information Engineering, Liaoning University of Technology, Jinzhou, Liaoning 121001, China
    2.School of Resources and Environment, Liaoning Technical University, Fuxin, Liaoning 123000, China
    3.Liaoning Vocational and Technical College, Tieling, Liaoning 112000, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2012-03-21 Published:2012-04-11

一种区间型数据的离散化方法

穆海军1,鄂 旭1,2,3,金成美1,王全铁3   

  1. 1.辽宁工业大学 电子与信息工程学院,辽宁 锦州 121001
    2.辽宁工程技术大学 资源与环境学院,辽宁 阜新 123000
    3.辽宁工程职业技术学院,辽宁 铁岭 112000

Abstract: The area of knowledge discovery and data mining are growing rapidly. A large number of methods are employed to discrete data, however, most of the existing discretion methods are applied in the case of attributes with real-value. In the practical application, the attribute value is interval number in many cases. Aiming at this problem, a new discretization algorithm applied to interval numbers is proposed. Similarity degree of interval number is used to describe the similar relation of two interval numbers. Threshold degree is defined to ensure discrete relationship between the data to implement algorithm. A new variable-associated degree is proposed through analysing action of similarity degree in the algorithm, and associated degree is used to improve algorithm. A group of data set is applied to testing the performance of the algorithm and the experiment result is compared with other discretization algorithms. The experiment result shows that the algorithm is effective.

Key words: rough sets, discretization, interval number, similarity degree, associated degree

摘要: 随着数据挖掘和知识发现等技术的迅速发展,出现了很多数据离散的算法,但是,已有的离散化方法大多是针对固定点上的连续属性值的情况,实际应用中大量存在着连续区间属性值的情况。针对这一问题,提出了一种连续区间属性值离散化的新方法。通过区间数的相似度来描述对象间的相似关系,定义相似度阈度确定离散关系,来实现对区间数据的离散化,经过分析相似度在算法中的作用,提出了一种新的变量——关联度,改进了算法。采用多组数据对此算法的性能进行了检验,与其他算法做了对比试验,试验结果表明此算法是有效的。

关键词: 粗糙集, 离散化, 区间型数据, 相似度, 关联度