Computer Engineering and Applications ›› 2009, Vol. 45 ›› Issue (18): 132-134.DOI: 10.3778/j.issn.1002-8331.2009.18.040

• 数据库、信息处理 • Previous Articles     Next Articles

Mining Web logs to discover 3G Web site

BAO Yu   

  1. Department of Software Engineering,East China Normal University,Shanghai 200062,China
  • Received:2008-09-25 Revised:2008-11-17 Online:2009-06-21 Published:2009-06-21
  • Contact: BAO Yu

Web日志挖掘中3G WAP子网的获取研究

鲍 钰   

  1. 华东师范大学 软件学院,上海 200062
  • 通讯作者: 鲍 钰

Abstract: With the age of 3G,it is popular to visit WWW using mobile phone.Because of the small screen and slow net speed,it is better to provide a major sub Web site for the mobile phone visitors.The behavior of the Web page readers is imprinted in the Web server log files.Analyzing and exploring regularities in this behavior can find the high frequency visit path.Firstly,in this paper,converts the original sequence of log data into User Visit Path Session Dataset(UVPSD),then implements the discovery of major sub Web site structure by using reduced Weighted Web Site Structure Graph(WWSSG).This paper applies the algorithm on Shanghai community services Web site to get the 3G WAP major sub net.The experiment data indicates the sub net covers the major popular pages of the Web site.

Key words: Web log, Discover User Visit Path Session Dataset(DUVPSD), Weighted Web Site Structure Graph(WWSSG), 3G WAP

摘要: 随着3G时代的到来,手机上网已逐步普及,由于手机屏幕较小及上网带宽限制,需要为手机访问者提供只需保留原Web站点主干分支的WAP子网。WWW上用户的访问路径信息会被记录在Web服务器的日志记录中,分析这些日志并挖掘出用户的主要行为模式,可以提取出Web网站被频繁访问的主干部分。首先将原始日志序列转化成用户访问路径会话集UVPSD,然后通过约束的加权网站结构图WWSSG,最终实现了此Web站点的频繁主干子网的发现。在上海社区网上采用此算法提取出的3G WAP子网,实验数据表明,该子网覆盖了上海社区网的大部分热门栏目页面。

关键词: Web日志, 用户访问路径会话集发现算法, 加权网站结构图生成算法, 3G无线应用协议