Computer Engineering and Applications ›› 2007, Vol. 43 ›› Issue (10): 174-176.

• 数据库与信息处理 • Previous Articles     Next Articles

One solution about topic web crawler’s greedy search strategy

  

  • Received:2006-05-17 Revised:1900-01-01 Online:2007-04-01 Published:2007-04-01

一种改进的主题网络蜘蛛搜索算法

林海霞 原福永 陈金森 刘俊峰   

  1. 燕山大学信息技术与工程学院 燕山大学信息工程学院 计算机科学与技术系
  • 通讯作者: 林海霞

Abstract: Topic web crawler search strategy is the core of professional search engine technology. However, the current topic search algorithms always exist large greedy It is difficult to find optimal solutions in the overall situation. Through comparative analysis found that despite Best-First algorithm having shortcomings, but its performance is optimal in several algorithms So based on Best-First algorithms it raised BS-BS algorithms. Then it evaluated BS-BS algorithm .And found that not only "recall rate" had improved, but could get the optimal solutions in the overall situation.

Key words: topic web crawler, Best-First algorithm, recall ratio

摘要: 主题网络蜘蛛搜索策略是专业搜索引擎的核心技术。但是目前的主题搜索算法往往存在很大贪婪性,难以在全局范围内找到最优解。通过比较分析发现Best-First算法虽然有它的不足,但是它在几种算法中表现的性能最优。故以Best-First算法为基础,提出了BS-BS算法。对BS-BS算法进行性能评价,发现应用此算法搜索不但“召回率”有所提高,还能在一定程度上找到全局范围内的最优解。

关键词: 主题网络蜘蛛, Best-First算法, 召回率