Block-based topic crawling

Computer Engineering and Applications ›› 2008, Vol. 44 ›› Issue (9): 143-146.

• 数据库、信号与信息处理 • Previous Articles Next Articles

Block-based topic crawling

WU Xiao-ping,ZHANG Chang-li,ZHU Li-na

Computer Experiment Center，Shenyang Artillery College，Shenyang 110162，China

Received:2007-07-12 Revised:2007-10-24 Online:2008-03-21 Published:2008-03-21
Contact: WU Xiao-ping

基于网页内容块策略的主题爬行

吴晓平,张长利,朱丽娜

沈阳炮兵学院基础部计算机实验中心，沈阳 110162

通讯作者: 吴晓平

Abstract

Abstract: With the explosive growth of the World-Wide Web，to general-purpose crawlers and search engines which pose great challenges.All sorts of special topic search engines are designed for special people and special domains.The web topic information search system（web spider） is the most important part of topic search engine，it collects web pages of special topic and provides users with the result or stores it in index database.Information resource of web is so extensive，how to collect interest content comprehensively and effectively，it is important to web spider research.In this paper，a new crawling strategy block-based topic crawling has been proposed，the experiments show that compared with some traditional algorithms，this algorithm has better performance.It is effective and has high precision.

Key words: topic-specific search, topic crawling, search engine, crawling algorithm, correlation analysis

摘要： 因特网的迅速发展对传统的爬行器和搜索引擎提出了巨大的挑战。各种针对特定领域、特定人群的搜索引擎应运而生。Web主题信息搜索系统（网络蜘蛛）是主题搜索引擎的最主要的部分，它的任务是将搜集到的符合要求的Web页面返回给用户或保存在索引库中。Web 上的信息资源如此广泛，如何全面而高效地搜集到感兴趣的内容是网络蜘蛛的研究重点。提出了基于网页分块技术的主题爬行，实验结果表明，相对于其它的爬行算法，提出的算法具有较高的效率、爬准率、爬全率及穿越隧道的能力。

关键词: 定题搜索, 主题爬行, 搜索引擎, 爬行算法, 相关度分析

WU Xiao-ping,ZHANG Chang-li,ZHU Li-na

. Block-based topic crawling[J]. Computer Engineering and Applications, 2008, 44(9): 143-146.

吴晓平,张长利,朱丽娜. 基于网页内容块策略的主题爬行[J]. 计算机工程与应用, 2008, 44(9): 143-146.

[1]	JIANG Yafang, YAN Xin, LI Siyuan, XU Guangyi, ZHOU Feng. Construction of Khmer-Chinese Bilingual Word Embedding Based on Multiple CCA Algorithms [J]. Computer Engineering and Applications, 2020, 56(17): 167-172.
[2]	GUO Dongxin, CHEN Kaiyan, ZHANG Yang, SONG Shijie, GUO Huidong. Research on Security Evaluation Method for Prototypeless Side-Channel Analysis of Cryptographic Chips [J]. Computer Engineering and Applications, 2019, 55(22): 69-72.
[3]	CAO Weidong1，2, XU Daidai2, WANG Jing2, WANG Jialiang2. NOSHOW Prediction and Strong Factor Association Analysis in Civil Aviation [J]. Computer Engineering and Applications, 2019, 55(2): 221-227.
[4]	WU Xiaoquan1，2, LI Hui1，2, CHEN Mei1，2, DAI Zhenyu1，2. DRVisSys： visualization recommendation system based on attribute correlation analysis [J]. Computer Engineering and Applications, 2018, 54(7): 251-256.
[5]	XIAO Peng1, LIU Na1, JI Changqing2, LI Yuanyuan3, LU Ying1, TANG Xiaojun1. Medical diagnosis expert system based on correlation analysis of features [J]. Computer Engineering and Applications, 2018, 54(23): 264-270.
[6]	WANG Xianfeng, HUANG Wenzhun, ZHANG Shanwen. Multi-view gait recognition method based on weighted local discriminant canonical correlation analysis [J]. Computer Engineering and Applications, 2018, 54(21): 90-94.
[7]	DONG Enzeng, GUO Guangrui, CHEN Chao. Research of steady-state visual evoked potential based online brain-computer interface [J]. Computer Engineering and Applications, 2017, 53(3): 154-159.
[8]	YANG Heping1, CHEN Yu2, 3, ZHANG Zhiqiang1. Design and implementation of Web concise ontology-base vertical search engine [J]. Computer Engineering and Applications, 2017, 53(19): 257-264.
[9]	DONG Xiwei. Local manifold reconstruction based semi-supervised multi-view image classification [J]. Computer Engineering and Applications, 2016, 52(18): 24-30.
[10]	DENG Li1，2, CHEN Bo1, YU Suihuai2. HRA evaluation of interior environmental of cabin based on analytic hierarchy process and grey correlation analysis [J]. Computer Engineering and Applications, 2016, 52(1): 260-265.
[11]	DENG Xiaomei, WU Gang. Evaluating user satisfaction of search engine using click log [J]. Computer Engineering and Applications, 2015, 51(8): 245-249.
[12]	HAN Yuexiang1, LI Zhaoguo1, ZHENG Zhe2, ZHANG Galin3. Face recognition based on hybrid features fusion by kernel canonical correlation analysis [J]. Computer Engineering and Applications, 2015, 51(23): 179-183.
[13]	SHA Guanghua, CHEN Yong, ZHANG Changjiang. Application of read-write splitting technologies in operation support system [J]. Computer Engineering and Applications, 2015, 51(12): 107-110.
[14]	LIU Fumin, ZHANG Zhibin, SHEN Jiquan. Emotion recognition based on multi-features fused by kernel canonical correlation analysis [J]. Computer Engineering and Applications, 2014, 50(9): 193-196.
[15]	HAN Yuexiang. Face automatic recognition algorithm based on canonical correlation analysis fusion global and local features [J]. Computer Engineering and Applications, 2014, 50(5): 142-146.

Block-based topic crawling

基于网页内容块策略的主题爬行

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics