Research on search strategy of web spider in topic-oriented search engines

Computer Engineering and Applications ›› 2014, Vol. 50 ›› Issue (2): 116-119.

Previous Articles Next Articles

Research on search strategy of web spider in topic-oriented search engines

SHI Baoming1, HE Yuanxiang1, WU Chongzheng2

1.School of Electronics and Information Engineering, Lanzhou University of Arts and Science, Lanzhou 730000, China
2.School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

Online:2014-01-15 Published:2014-01-26

主题搜索引擎中爬虫搜索策略的研究

史宝明1，贺元香1，吴崇正2

1.兰州文理学院电子信息工程学院，兰州 730000
2.兰州理工大学计算机与通信学院，兰州 730050

Abstract

Abstract: In order to solve the low efficiency problem of traditional focused crawler, web spider always selects the most valuable links to visit, so how to focus the search around a given topic is a key problem. The traditional method always only computes the relevance of the links, but ignores the relevance among the unlabeled URL, now it proposes the algorithm based on link model which combines the seed URL with unlabeled URL to compute the relevance of the other URL, and it deduces the point that initial iterative is insensitivity of the results. Compared with the methods based on traditional algorithm, experimental result proves the performance of the new algorithm is more efficient than the traditional ones.

Key words: web spider, topic-oriented search engine, search strategy, Vector Space Model（VSM）

摘要： 为了解决传统主题爬虫效率偏低的问题，传统主题爬虫会选择最有价值的链接进行访问，仅简单地计算链接的相关性，却忽视待分析URL之间的相关性关系，致使主题爬虫爬取效率较低。提出一种基于链接模型的相关性判别算法，综合利用有标种子URL和无标的待判别URL实现对无标URL的相关性判别，并推导出迭代初值选取对结果的不敏感性。实验结果表明，与传统的网络爬虫算法相关性判别方法相比，提出的方法效率更高。

关键词: 网络爬虫, 主题搜索引擎, 搜索策略, 向量空间模型

SHI Baoming1, HE Yuanxiang1, WU Chongzheng2. Research on search strategy of web spider in topic-oriented search engines[J]. Computer Engineering and Applications, 2014, 50(2): 116-119.

史宝明1，贺元香1，吴崇正2. 主题搜索引擎中爬虫搜索策略的研究[J]. 计算机工程与应用, 2014, 50(2): 116-119.

[1]	ZHANG Ziran, HUANG Weihua, CHEN Yang, ZHANG Zheng, LI Ziyuan. Improved Ant Colony Path Planning Algorithm Based on Bidirectional Search [J]. Computer Engineering and Applications, 2021, 57(21): 270-277.
[2]	HAO Xiang, HE Yichao, ZHU Xiaobin, ZHAI Qinglei. Discrete Hybrid Multi-verse Optimization Algorithm for Solving Discounted {0-1} Knapsack Problem [J]. Computer Engineering and Applications, 2021, 57(18): 103-113.
[3]	LI Zhiqin, DU Jianqiang, NIE Bin, XIONG Wangping, HUANG Canyi, LI Huan. Summary of Feature Selection Methods [J]. Computer Engineering and Applications, 2019, 55(24): 10-19.
[4]	YU Wuyang, ZHOU Yang. Study of Unequal Area Facility Layout Problem with Improved Flexible Bay Structure [J]. Computer Engineering and Applications, 2019, 55(14): 221-227.
[5]	XIANG Guangli, LI Ankang, LIN Xiang, XIONG Bin. Multiple keywords retrieval scheme based on homomorphic encryption [J]. Computer Engineering and Applications, 2018, 54(2): 97-101.
[6]	ZHANG Wenpeng, WANG Xing. Application of improved bat algorithm to JSP [J]. Computer Engineering and Applications, 2017, 53(8): 137-140.
[7]	ZHU Zhitong1, GUO Xing1，2, LI Wei1，2. Research on new fruit fly optimization algorithm [J]. Computer Engineering and Applications, 2017, 53(6): 40-45.
[8]	LAI Xuefang, HE Xingshi. Method based on minimum redundancy and maximum separability for feature selection [J]. Computer Engineering and Applications, 2017, 53(12): 70-75.
[9]	LU Yuming1，4, WANG Yanchao2, LIU Jiarui3, Wu Liu4. Improved biogeography-based optimization algorithm [J]. Computer Engineering and Applications, 2016, 52(17): 146-151.
[10]	ZHAO Feng1, YANG Wei1，2，3, YANG Zhaoxu3, SUN Shaoshan3. UAV three-dimensional dynamic route planning and guidance control research [J]. Computer Engineering and Applications, 2014, 50(2): 58-64.
[11]	ZHANG Yong, WU Chongzheng. Improved context graph algorithm by using feature selection based on word frequency differentia [J]. Computer Engineering and Applications, 2014, 50(10): 141-146.
[12]	MA Wenwen1, WEI Wenhan1, DEGN Yigui1，2. Micro-blog topic detection method based on Latent Semantic Analysis [J]. Computer Engineering and Applications, 2014, 50(1): 96-100.
[13]	LIU Donghua, GAN Ruoxun, FAN Suohai, YANG Minghua. Particle Swarm Optimization based on Predatory Search for portfolio investment [J]. Computer Engineering and Applications, 2013, 49(6): 253-256.
[14]	YU Jinping1, ZHU Guixiang2, MEI Hongbiao3. Research and improvement of HITS algorithm based on Web link analysis [J]. Computer Engineering and Applications, 2013, 49(21): 42-45.
[15]	ZHANG Shuzhen, HUANG Guangya. Fast image segmentation algorithm based on three-dimensional exponential gray entropy [J]. Computer Engineering and Applications, 2013, 49(21): 119-122.

Research on search strategy of web spider in topic-oriented search engines

主题搜索引擎中爬虫搜索策略的研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics