计算机工程与应用 ›› 2011, Vol. 47 ›› Issue (12): 220-224.

• 工程与应用 • 上一篇    下一篇

面向电信欠费挖掘的数据质量评估策略研究

王晓华,苏宏业,渠 瑜,褚 健   

  1. 浙江大学 智能系统与控制研究所 工业控制技术国家重点实验室,杭州 310027
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2011-04-21 发布日期:2011-04-21

Research on telecom insolvency mining oriented data quality assessing strategy

WANG Xiaohua,SU Hongye,QU Yu,CHU Jian   

  1. State Key Laboratory of Industrial Control Technology,Institute of Cyber-Systems and Control,Zhejiang University,Hangzhou 310027,China
  • Received:1900-01-01 Revised:1900-01-01 Online:2011-04-21 Published:2011-04-21

摘要: 针对电信欠费挖掘主题,结合电信欠费数据非平衡的特点,重点研究了缺失与离群数据对分类结果的影响,从而提出了一个面向电信欠费挖掘的数据质量评估体系(TIM-DQAS):对于缺失评估,提出了一种基于类分布差异的属性加权算法,以衡量输入属性的缺失代价;对于离群评估,分析了非平衡数据中的离群点对分类结果的影响,提出离群度的概念,以量化离群点的影响。基于某城市电信小灵通数据的对比实验,给出了评估结果的参照值,验证了评估策略的有效性。

关键词: 电信, 数据挖掘, 欠费主题, 数据质量评估, 缺失, 非平衡, 离群度, ,

Abstract: Aiming at telecom insolvency mining,combining with the imbalance nature of telecom insolvency data,the research priority is set upon the impact on classification result caused by missing values and outliers,and thus a Data Quality Assessment System for Telecom Insolvency Mining(TIM-DQAS) is presented.In the missing evaluation sub-system,a class-
distribution-based attribute weighting algorithm is presented to measure the missing costs of input attributes.In the outlier evaluation sub-system,the impact on classification result caused by outliers in imbalance data is analyzed,and the outlier degree is proposed to measure the impact caused by outliers.Based on a series of contrast experiments on telecom personal handphone data of a city,a reference assessing result is provided,and the effectiveness of the assessing strategy is verified.

Key words: telecom, data mining, insolvency, data quality assessment, missing value, imbalance, outlier degree