基于加权快速聚类的异常数据挖掘算法

计算机工程与应用 ›› 2007, Vol. 43 ›› Issue (35): 153-155.

基于加权快速聚类的异常数据挖掘算法

李星毅^1,2,包从剑²,施化吉²,奚春海³

1.北京交通大学电子信息学院，北京 100044
2.江苏大学计算机科学与通信工程学院，江苏镇江 212013
3.亭旁中学计算机中心，浙江三门 317103

收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-12-11 发布日期:2007-12-11
通讯作者: 李星毅

Outlier data mining algorithms based on weighted fast clustering

LI Xing-yi^1,2,BAO Cong-jian²,SHI Hua-ji²,XI Chun-hai³

1.School of Electronics and Information Engineering，Beijing JiaoTong University，Beijing 100044，China
2.School of Computer Science and Telecommunications Engineering，JiangSu University，Zhenjiang，Jiangsu 212013，China
3.Center of Computer，TingPang Middle School，Sanmen，Zhejiang 317103，China

Received:1900-01-01 Revised:1900-01-01 Online:2007-12-11 Published:2007-12-11
Contact: LI Xing-yi

摘要/Abstract

摘要： 聚类是数据挖掘领域中最活跃的研究分支之一，并在其他的科学领域也有广泛的应用。设计了基于加权快速聚类的异常数据挖掘算法，以便能快速发现异常数据。首先通过对数据的每个属性赋予一定权值，权值的大小要体现其对分类的贡献度，并根据属性权值的特点，选择比较优良的初始分区，然后进行多次迭代，得到接近最优分区，接着运用一定规则，发现异常数据类，最后实践证明该技术取得很好的社会效果。

关键词: 异常数据, 数据挖掘, 学习规则, K-均值聚类, 加权快速聚类

Abstract: Clustering is one of the most flourish direction of data mining，and it has been applied abroad at other scientific fields.This article promoted outlier data mining algorithms based on weighted fast clustering to inspect and deal with outlier data effectively.The processes of algorithms were described in the followings，firstly，the each property of data should be endowed with certain weight to incarnate its sort devotion degree，and choose better initialization subarea according to the weight characteristics of property，and get to the best subarea under many times iteration，and then find outlier data by the application of certain data class.Finally，the experiment demonstrated this technology obtained better social effect.

Key words: outlier data, data mining, learning rule, K-mean clustering, weighted fast clustering

李星毅^1,2,包从剑²,施化吉²,奚春海³. 基于加权快速聚类的异常数据挖掘算法[J]. 计算机工程与应用, 2007, 43(35): 153-155.

LI Xing-yi^1,2,BAO Cong-jian²,SHI Hua-ji²,XI Chun-hai³. Outlier data mining algorithms based on weighted fast clustering[J]. Computer Engineering and Applications, 2007, 43(35): 153-155.

[1]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[2]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[3]	马洋，赵旭俊. 基于相关子空间的多源离群检测算法[J]. 计算机工程与应用, 2021, 57(17): 88-95.
[4]	张念蓬，吴旭，朱强. 基于熵的过采样框架[J]. 计算机工程与应用, 2021, 57(13): 96-101.
[5]	张博文，刘智，桑国明. 基于核密度波动的异常检测算法[J]. 计算机工程与应用, 2021, 57(12): 132-136.
[6]	饶加旺，马荣华. 改进核密度估计的空间点密度算法[J]. 计算机工程与应用, 2021, 57(11): 260-265.
[7]	王杰，陈志刚，刘加玲，程宏兵. 基于聚类的云隐私行为挖掘技术[J]. 计算机工程与应用, 2020, 56(5): 80-84.
[8]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[9]	衣俊艳，吴博雅，雍巧玲. 具有加权特性的弹性网络聚类算法研究[J]. 计算机工程与应用, 2020, 56(22): 55-65.
[10]	纪文璐，王海龙，苏贵斌，柳林. 基于关联规则算法的推荐方法研究综述[J]. 计算机工程与应用, 2020, 56(22): 33-41.
[11]	刘文芬，穆晓东，黄月华. 基于多分辨率网格的异常检测方法[J]. 计算机工程与应用, 2020, 56(17): 78-85.
[12]	孟海东1，2，孙新军2，宋宇辰1. 基于数据场的改进LOF算法[J]. 计算机工程与应用, 2019, 55(3): 154-158.
[13]	雷乐，王丽珍，肖清. 空间co-location模式挖掘中的模糊技术初探[J]. 计算机工程与应用, 2019, 55(21): 158-166.
[14]	郭鹏，蔡骋. 基于聚类和关联算法的学生成绩挖掘与分析[J]. 计算机工程与应用, 2019, 55(17): 169-179.
[15]	李永攀，门锟，吴俊阳. 基于低秩模型的电力状态数据异常检测[J]. 计算机工程与应用, 2019, 55(16): 255-258.