Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (23): 136-138.DOI: 10.3778/j.issn.1002-8331.2008.23.042

• 数据库、信号与信息处理 • Previous Articles     Next Articles

Mining frequent access patterns on Web mining based on BIPL algorithm

WU Ya-shuang,ZHANG Dong-zhan   

  1. Department of Computer Science,Xiamen University,Xiamen,Fujian 361005,China
  • Received:2008-02-22 Revised:2008-04-29 Online:2008-08-11 Published:2008-08-11
  • Contact: WU Ya-shuang

基于BIPL的Web频繁访问模式挖掘

吴雅双,张东站   

  1. 厦门大学 计算机科学系,福建 厦门 361005
  • 通讯作者: 吴雅双

Abstract: Mining frequent access patterns is an important task of Web log mining.In connection with the shortage of the similar Apriori algorithm and the GITC algorithm,the paper presents BIPL algorithm which is used to mine the Web frequent access patterns.The algorithm is based on parents list and intersection,and requests to scan the database only one times.It first gets the intersections of each two access patterns and gives the birth to candidate access patterns.And the parents access patterns of each candidate access pattern are saved in the process of intersection.Then the counts of all the candidate access patterns can be calculated easily through add operational.Finally,the algorithm is proved to be stable and efficient through theoretical analysis and experimental proof.

Key words: Web log mining, intersection relation, frequent access pattern

摘要: 挖掘频繁访问模式是Web日志挖掘的一个重要任务。针对类Apriori算法和GITC算法的不足,提出了基于双亲链的单次扫描求交的Web频繁访问模式挖掘算法—BIPL,该算法首先对用户的访问模式两两进行交集运算,生成候选访问模式,并在求交集过程中保存各个候选访问模式的双亲模式,然后通过简单的求和运算,计算出各个候选访问模式的支持数。最后通过理论分析和实验验证,该算法是稳定的和高效的。

关键词: Web日志挖掘, 交集关系, 频繁访问模式