Computer Engineering and Applications ›› 2011, Vol. 47 ›› Issue (12): 220-224.
• 工程与应用 • Previous Articles Next Articles
WANG Xiaohua,SU Hongye,QU Yu,CHU Jian
Received:
Revised:
Online:
Published:
王晓华,苏宏业,渠 瑜,褚 健
Abstract: Aiming at telecom insolvency mining,combining with the imbalance nature of telecom insolvency data,the research priority is set upon the impact on classification result caused by missing values and outliers,and thus a Data Quality Assessment System for Telecom Insolvency Mining(TIM-DQAS) is presented.In the missing evaluation sub-system,a class- distribution-based attribute weighting algorithm is presented to measure the missing costs of input attributes.In the outlier evaluation sub-system,the impact on classification result caused by outliers in imbalance data is analyzed,and the outlier degree is proposed to measure the impact caused by outliers.Based on a series of contrast experiments on telecom personal handphone data of a city,a reference assessing result is provided,and the effectiveness of the assessing strategy is verified.
Key words: telecom, data mining, insolvency, data quality assessment, missing value, imbalance, outlier degree
摘要: 针对电信欠费挖掘主题,结合电信欠费数据非平衡的特点,重点研究了缺失与离群数据对分类结果的影响,从而提出了一个面向电信欠费挖掘的数据质量评估体系(TIM-DQAS):对于缺失评估,提出了一种基于类分布差异的属性加权算法,以衡量输入属性的缺失代价;对于离群评估,分析了非平衡数据中的离群点对分类结果的影响,提出离群度的概念,以量化离群点的影响。基于某城市电信小灵通数据的对比实验,给出了评估结果的参照值,验证了评估策略的有效性。
关键词: 电信, 数据挖掘, 欠费主题, 数据质量评估, 缺失, 非平衡, 离群度, ,
WANG Xiaohua,SU Hongye,QU Yu,CHU Jian. Research on telecom insolvency mining oriented data quality assessing strategy[J]. Computer Engineering and Applications, 2011, 47(12): 220-224.
王晓华,苏宏业,渠 瑜,褚 健. 面向电信欠费挖掘的数据质量评估策略研究[J]. 计算机工程与应用, 2011, 47(12): 220-224.
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/
http://cea.ceaj.org/EN/Y2011/V47/I12/220