计算机工程与应用 ›› 2012, Vol. 48 ›› Issue (13): 138-143.

• 数据库、信号与信息处理 • 上一篇    下一篇

网络用户行为的隐私保护数据挖掘方法

王  艳1,2,乐嘉锦3,孙  捷1,姜久雷1   

  1. 1.东华大学 旭日工商管理学院,上海 200061
    2.上海海洋大学 信息学院,上海 201306
    3.东华大学 计算机科学与技术学院,上海 201620
  • 出版日期:2012-05-01 发布日期:2012-05-09

Privacy protection data mining method based on network user behavior

WANG Yan1,2, LE Jiajin3, SUN Jie1, JIANG Jiulei1   

  1. 1.Glorious Sun School of Business and Management, Donghua University, Shanghai 200061, China
    2.College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
    3.School of Computer Science and Technology, Donghua University, Shanghai 201620, China
  • Online:2012-05-01 Published:2012-05-09

摘要: 隐私保护的数据挖掘近年来已经为数据挖掘的研究热点,Web网站的服务器日志保存了用户访问页面的信息,如果不加以保护会导致用户隐私数据的泄漏。针对这个问题,讨论了在Web数据挖掘中用户行为的隐私保护问题,进而提出一种将Web服务器日志信息转换成关系数据表的方法,并通过随机化回答方法产生干扰数据表项中信息,再以此为基础,提供给数据使用者进行频繁项集以及强关联规则的发现算法,从而得到真实保密的网上购物篮商品间的关联规则。经实验证明,提出的Web使用挖掘中的隐私保护关联规则挖掘算法隐私性较好,具有一定的适用性。

关键词: 数据挖掘, 会话识别, 隐私保护, 关联规则, Web日志

Abstract: Data mining based on privacy preservation has become a research hot point now. Web server logs save the information of the customer access to page, there will be leaking users’ privacy data if not to protect. This paper discusses the privacy protection of customer act in the Web data mining, and puts forward a method to convert the information of Web server log into relational data tables, and through randomized response methods interfere with the data information, then presents the frequent itemsets and strong association rules discovery algorithm and derives the association rules of online shopping basket. The experimental results validate the algorithms by applying it on real datasets.

Key words: data mining, privacy preservation, randomized response, association rules, Web log