计算机工程与应用 ›› 2008, Vol. 44 ›› Issue (29): 236-238.DOI: 10.3778/j.issn.1002-8331.2008.29.068

• 工程与应用 • 上一篇    下一篇

基于粗糙集和聚类的纳税属性离散化方法

徐林章,韩 臻,张艳宁   

  1. 西北工业大学 计算机学院,西安 710072
  • 收稿日期:2008-03-19 修回日期:2008-06-12 出版日期:2008-10-11 发布日期:2008-10-11
  • 通讯作者: 徐林章

Discretization method of taxpayer’s continuous attribute based on rough set and cluster analysis

XU Lin-zhang,HAN Zhen,ZHANG Yan-ning   

  1. Department of Computer Science and Engineering,Northwest Polytechnical University,Xi’an 710072,China
  • Received:2008-03-19 Revised:2008-06-12 Online:2008-10-11 Published:2008-10-11
  • Contact: XU Lin-zhang

摘要: 将粗糙集理论中属性重要度和依赖度的概念与分级聚类离散化算法相结合,提出了一种纳税人连续型属性动态的离散化算法。首先将纳税数据对象的每个连续型属性划分为2类,然后利用粗糙集理论计算每个条件属性对于决策属性的重要度,再通过重要度由大至小排序进行增类运算,最后将保持与原有数据对象集依赖度一致的分类结果输出。该算法能够动态地对数据对象进行类别划分,实现纳税人连续型属性的离散化。通过采用专家分析和关联分析的实验结果,验证了该算法具有较高的纳税人连续型属性离散化精度和性能。

关键词: 粗糙集, 分级聚类, 离散化, 数据预处理, 税源分析

Abstract: A dynamic discrete algorithm method to taxpayers’ continuous attributes is proposed through combining the conception of attribute importance degree as well as dependence degree in rough sets and clustering discrete algorithm method.First,categorize each continuous attribute of the tax data objects into 2 groups,and then calculate the importance degrees of each conditional attribute related to its decision attribute.Second,make adding-categories calculations through sorting the importance degrees in ascending order.Last,output those classification results which have the same dependence degrees with original data objects set.This algorithm method can classify the data objects dynamically and realize the discretizaion of taxpayers’ continuous attributes.This algorithm method has been tested in the experimental results of expert analysis and association analysis,which is proven that this method has higher accuracy and property of taxpayers’ continuous attributes discretization.

Key words: rough set, clustering, discrete, data pre-processing, tax source analysis