Computer Engineering and Applications ›› 2019, Vol. 55 ›› Issue (23): 216-221.DOI: 10.3778/j.issn.1002-8331.1808-0247
Previous Articles Next Articles
JIA Xiaoli, WU Rui, WU Siying
Online:
Published:
贾晓莉,吴瑞,吴思颖
Abstract: Web log mining analyzes user access patterns to gain users’ level of interest. Currently, most web log mining is based on frequency, but the information that it mines is not of much value. In this paper, the proposed clustering technique is based on access time, firstly, the fuzzy vector is used to represent the user access patterns, recording whether the user has visited the page and the time of browsing. Then, the users’ access sequences are analyzed by different clustering methods. In addition, a two-layer clustering technique is proposed based on the fuzzy rough [k]-means and angle cosine, which can reduce the sensitivity to the initial clustering center. And the feasibility of the clustering method is demonstrated by a series of experiments. The results of different clustering methods are verified by using the Davies-Bouldin index. When the data sets are too large, the algorithm is inefficient. Therefore, it uses MapReduce to realize the parallelism of two-layer clustering, improving the efficiency of clustering.
Key words: web mining, fuzzy rough clustering, web access patterns, angle cosine, parallel
摘要: Web日志挖掘可以通过对用户访问模式进行分析,以获取用户的访问兴趣程度。目前,大多数的web日志挖掘是基于频率的,其挖掘的信息没有太大的价值。而提出的聚类技术是基于访问时间的,使用模糊向量表示用户浏览模式,记录用户是否浏览过该页面以及停留的时间。通过不同的聚类方法对用户的访问序列进行聚类分析。将模糊粗糙[k]-均值和夹角余弦相结合,提出了一种双层聚类技术,减少了对初始聚类中心的敏感性,并且通过一系列实验,论证了该聚类方法的可行性。而且,实验通过使用Davies-Bouldin指标来验证不同聚类方法的效果并进行比较。由于数据量大时,仍然存在算法效率低的问题,因此,使用MapReduce实现双层聚类的并行化,提高了聚类的效率。
关键词: web挖掘, 模糊粗糙聚类, web访问模式, 夹角余弦, 并行
JIA Xiaoli, WU Rui, WU Siying. Parallel Distributed Web Access Patterns Two-Layer Clustering[J]. Computer Engineering and Applications, 2019, 55(23): 216-221.
贾晓莉,吴瑞,吴思颖. 并行分布式的Web访问模式双层聚类[J]. 计算机工程与应用, 2019, 55(23): 216-221.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.1808-0247
http://cea.ceaj.org/EN/Y2019/V55/I23/216