基于最大频繁项集挖掘的微博炒作群体发现方法

doi:10.3778/j.issn.1002-8331.1507-0176

计算机工程与应用 ›› 2017, Vol. 53 ›› Issue (4): 90-97.DOI: 10.3778/j.issn.1002-8331.1507-0176

基于最大频繁项集挖掘的微博炒作群体发现方法

刘琰，张进，陈静，尹美娟，张伟丽

数学工程与先进计算国家重点实验室，郑州 450002

出版日期:2017-02-15 发布日期:2017-05-11

Detection of hype groups based on mining maximum frequent itemsets in Microblogs

LIU Yan, ZHANG Jin, CHEN Jing, YIN Meijuan, ZHANG Weili

State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China

Online:2017-02-15 Published:2017-05-11

摘要/Abstract

摘要： 近年来微博炒作账户异军突起，采用违规手段开展网络公关活动，严重扰乱了正常的互联网秩序。传统的炒作账户发现主要采用特征分析方法，忽视了炒作账户的组织性和策划性，难以发现隐蔽性高的炒作账户。针对以上问题，充分考虑到炒作账户共同参与微博炒作的群体特性，将炒作群体发现问题转化为挖掘最大频繁项集问题，提出了一种基于最大频繁项集挖掘的炒作群体发现方法，能够找出多次共同参与炒作微博传播的账户群体。为了提高最大频繁项集挖掘的效率，结合研究背景以及事务数据库的特点，提出了一种基于迭代交集的最大频繁项集发现算法，采用基于二分查找的最大频繁候选项集筛选策略对事务数据库进行缩减，并利用多种方式减少事务间取交集的次数。最后通过实验对IIA算法的性能进行了评估，并在真实的新浪微博数据集上验证了炒作群体发现方法的有效性，实验结果表明利用该方法发现的炒作群体准确率高于90%，而且能发现传统特征分析方法难以识别的隐蔽炒作账户。

关键词: 数据挖掘, 微博, 炒作群体, 最大频繁项集

Abstract: In recent years, the hype accounts in Microblogs rise as a new force, using illegal means to carry out the network public relations activities, which has seriously disturbed the normal order of the Internet. The traditional detection of hype accounts mainly uses methods based on feature analysis, ignoring that hype accounts are strongly organizational and planning, which is difficult to find the concealed ones. In view of the above problems, fully considering the group characteristics that hype accounts often participate in hype microblogs together, the problem of hype groups detection is transformed into the problem of mining maximum frequent itemsets, and a method based on mining maximum frequent itemsets for the detection of hype groups is proposed, which can find accounts groups who have participated in hype microblogs together in many times. According to the research background and the characteristics of transaction database, a new algorithm based on iterative intersection is proposed to improve the efficiency of mining maximum frequent itemsets, which uses a selection strategy based on binary search algorithm to reduce the transaction database, and uses a variety of ways to reduce the times of intersection between transactions. Finally, the performance of IIA algorithm is evaluated by experiments, and experiments are conducted on a real dataset from Sina Weibo, the experiments results show that this method can find highly concealed hype accounts that can’t be identified by traditional methods based on feature analysis, with the accuracy rate of up to 90%.

Key words: data mining, microblog, hype groups, maximum frequent itemsets

刘琰，张进，陈静，尹美娟，张伟丽. 基于最大频繁项集挖掘的微博炒作群体发现方法[J]. 计算机工程与应用, 2017, 53(4): 90-97.

LIU Yan, ZHANG Jin, CHEN Jing, YIN Meijuan, ZHANG Weili. Detection of hype groups based on mining maximum frequent itemsets in Microblogs[J]. Computer Engineering and Applications, 2017, 53(4): 90-97.

[1]	赵圆丽，梁志剑. 基于异核卷积双注意机制的立场检测研究[J]. 计算机工程与应用, 2021, 57(8): 119-125.
[2]	宗晓萍，陶泽泽. 基于掌握速度的知识追踪模型[J]. 计算机工程与应用, 2021, 57(6): 117-123.
[3]	高天宇，王庆荣，杨磊. 粗糙集属性依赖度强化的应急数据挖掘模型[J]. 计算机工程与应用, 2021, 57(3): 87-93.
[4]	吴迪，张梦甜，生龙，黄竹韵，顾明星. 改进在线词对主题模型的微博热点话题演化[J]. 计算机工程与应用, 2021, 57(24): 179-184.
[5]	沈瑞琳，潘伟民，彭成，尹鹏博. 基于多任务学习的微博谣言检测方法[J]. 计算机工程与应用, 2021, 57(24): 192-197.
[6]	马洋，赵旭俊. 基于相关子空间的多源离群检测算法[J]. 计算机工程与应用, 2021, 57(17): 88-95.
[7]	张念蓬，吴旭，朱强. 基于熵的过采样框架[J]. 计算机工程与应用, 2021, 57(13): 96-101.
[8]	张博文，刘智，桑国明. 基于核密度波动的异常检测算法[J]. 计算机工程与应用, 2021, 57(12): 132-136.
[9]	饶加旺，马荣华. 改进核密度估计的空间点密度算法[J]. 计算机工程与应用, 2021, 57(11): 260-265.
[10]	王杰，陈志刚，刘加玲，程宏兵. 基于聚类的云隐私行为挖掘技术[J]. 计算机工程与应用, 2020, 56(5): 80-84.
[11]	李东昊，杨文忠，仲丽君，张志豪，王雪颖. 基于重点博文的突发事件检测方法[J]. 计算机工程与应用, 2020, 56(4): 175-183.
[12]	王子龙，李进，宋亚飞. 基于距离和权重改进的K-means算法[J]. 计算机工程与应用, 2020, 56(23): 87-94.
[13]	纪文璐，王海龙，苏贵斌，柳林. 基于关联规则算法的推荐方法研究综述[J]. 计算机工程与应用, 2020, 56(22): 33-41.
[14]	衣俊艳，吴博雅，雍巧玲. 具有加权特性的弹性网络聚类算法研究[J]. 计算机工程与应用, 2020, 56(22): 55-65.
[15]	刘文芬，穆晓东，黄月华. 基于多分辨率网格的异常检测方法[J]. 计算机工程与应用, 2020, 56(17): 78-85.

基于最大频繁项集挖掘的微博炒作群体发现方法

Detection of hype groups based on mining maximum frequent itemsets in Microblogs

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics