Research on biological information mining model based on document set

Abstract

Abstract: As the quantity of literature increases dramatically, to get the information manully can’t adapt to the speed of added literature. This paper proposes a new model of biological data mining, utilizing some tools of open source such as Stanford Parser, using some approaches such as natural language processing and statistics. It also analyzes its crucial technique. During the process to test the SBQTL（Soybean Quantitative Trait Loci） using this model, the precision and recall rate are 93.0% and 78.4% respectively. During the process to test the PubMed, the precision and recall rate are 94.3% and 80.0% respectively. So the problem that the researchers who are engaged in biomedicine can find the information they need from large quantity of literature quickly and efficiently is solved, and biologists can find closet information in biomedicine and verificate the newest science discovery. Thus, people can better understand the phenomenon of biomedicine.

Key words: text mining, Stanford Parser, text preprocessing, dependencies, information extraction

摘要： 针对生物医学文献的数量急剧增长，人工从文献中获取所需要的信息已不能适应生物医学文献数量迅速生长的需要。利用Stanford Parser等开源工具，采用自然语言处理技术、统计学等多种方法，提出了一种新型的生物信息挖掘模型，并对其关键技术进行分析。该模型在对全文文本SBQTL（Soybean Quantitative Trait Loci）测试中父母本信息提取的准确率和召回率分别为93.0%和78.4%；在对PubMed测试中，准确率和召回率分别为94.3%和80.0%。解决了生物医学研究者从海量文献中更有效、快速地找到所需信息的问题，以便生物学家发现隐藏的生物医学知识并验证得到新的科学发现，从而使人们对生物医学现象的认识得到了提高。

关键词: 文本挖掘, Stanford Parser, 文本预处理, 依存关系, 信息抽取

SUN Hongmin, JIANG Nannan, LI Xiang. Research on biological information mining model based on document set[J]. Computer Engineering and Applications, 2016, 52(24): 102-106.

孙红敏，姜楠楠，李想. 基于文档集的生物信息挖掘模型研究[J]. 计算机工程与应用, 2016, 52(24): 102-106.

[1]	WEI Hao, ZHOU Ai, ZHANG Yijia, CHEN Fei, QU Wen, LU Mingyu. Review of Deep Learning-Based Biomedical Entity Relation Extraction Research [J]. Computer Engineering and Applications, 2021, 57(21): 14-23.
[2]	WU Cheng, WANG Chaokun, WANG Muxian. Entity Attributes Extraction Based on Text Simplification [J]. Computer Engineering and Applications, 2020, 56(21): 115-122.
[3]	LIU Chenhui, ZHANG Desheng, HU Gang. Research on Chinese Key Phrase Extraction Algorithm Based on TAKE [J]. Computer Engineering and Applications, 2020, 56(10): 115-121.
[4]	HUANG Cheng1，2, LIU Jiayong1, LIU Liang1, HE Xiang1, TANG Dianhua2. Research on extraction model of malicious domain corpus based on context semantics [J]. Computer Engineering and Applications, 2018, 54(9): 101-108.
[5]	WANG Haiyong, FENG Zhaoxu, YANG Haibo, ZHANG Jindong. Research on text extraction algorithm based on structure similarity page clustering [J]. Computer Engineering and Applications, 2018, 54(11): 122-127.
[6]	DU Boyuan1, WANG Meiqing1, CHEN Changfu2, CHEN Fei1. Tags extraction for Web information based on structure consistency and feature learning [J]. Computer Engineering and Applications, 2017, 53(7): 74-78.
[7]	ZHAO Xiaoyong, WANG Lei. Product specification auto extract method of e-commerce websites [J]. Computer Engineering and Applications, 2017, 53(24): 168-171.
[8]	YANG Guanzhong, LI Hongxuan. Approach based on WSFT for crawling deep web [J]. Computer Engineering and Applications, 2017, 53(18): 236-242.
[9]	GU Nannan, FENG Jun, SUN Xia, ZHAO Yan, ZHANG Lei. Chinese resume information automatic extraction and recommendation algorithm [J]. Computer Engineering and Applications, 2017, 53(18): 141-148.
[10]	CHEN Di, DAI Yanjun, WANG Zhifeng. Survey of research on forum topic mining [J]. Computer Engineering and Applications, 2017, 53(16): 36-44.
[11]	HAN Yonghua, LEI Yuxia, CHEN Juan, WANG Xiangde. Multi-frame knowledge inconsistency detection and revision algorithms [J]. Computer Engineering and Applications, 2016, 52(23): 94-97.
[12]	QIU Yunfei, ZHAO Bin, LIN Mingming, WANG Wei. Improved K-means clustering algorithm combined semantic similarity of short text [J]. Computer Engineering and Applications, 2016, 52(19): 78-83.
[13]	SHAO Hao. Topic mining in trade policy review [J]. Computer Engineering and Applications, 2016, 52(11): 60-67.
[14]	YI Zheng, XU Wuping, XU Aiping. Discovery method of webpage subject area based on structural analysis [J]. Computer Engineering and Applications, 2015, 51(6): 227-230.
[15]	HUANG Yanjiao, WU Qin, LIANG Jiuzhen. Boosted constrained conditional random fields for Web object information extraction [J]. Computer Engineering and Applications, 2015, 51(23): 143-148.

Research on biological information mining model based on document set

基于文档集的生物信息挖掘模型研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics