SHORT DOCUMENTS CLASSIFICATION METHOD IN VERY LARGE TEXT DATABASE

Computer Engineering and Applications ›› 2006, Vol. 42 ›› Issue (22): 5-.

• 博士论坛 • Previous Articles Next Articles

SHORT DOCUMENTS CLASSIFICATION METHOD IN VERY LARGE TEXT DATABASE

湖南长沙国防科技大学计算机学院网络所613室

Received:2006-03-22 Revised:1900-01-01 Online:2006-08-01 Published:2006-08-01

大规模文本数据库中的短文分类方法

王永恒,贾焰,杨树强

湖南长沙国防科技大学计算机学院网络所613室

通讯作者: 王永恒 tommywang

Abstract

Abstract: With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short documents. It is very useful to classify such short documents to get knowledge automatically form the data. But most of the current classification algorithms can’t get acceptable accuracy since key words appear less time in short documents and the labeled training examples are usually very few. Some classification algorithms based on semantic information is more accurate but they are inefficient to be used to process very large document sets. In this paper, we propose a novel classification method based on semantic text features graph and kNN like method. Our experimental study shows that our algorithm is more accurate and efficient than other classification algorithms when classifying large scale short documents.

Key words: text mining, classification, short document, very large text database

摘要： 信息技术的飞速发展造成了大量的文本数据累积，其中很大一部分是短文本数据。文本分类技术对于从这些海量短文中自动获取知识具有重要意义。但是由于短文中的关键词出现次数少，而且带标签的训练样本又通常数量很少，现有的一般文本挖掘算法很难得到可接受的准确度。一些基于语义的分类方法获得了较好的准确度但又由于其低效性而无法适用于海量数据。文本提出了一个新颖的短文分类算法。该算法基于文本语义特征图，并使用类似kNN的方法进行分类。实验表明该算法在对海量短文进行分类时，其准确度和性能超过其它的算法。

关键词: 文本挖掘, 分类, 短文, 大规模文本数据库

,,. SHORT DOCUMENTS CLASSIFICATION METHOD IN VERY LARGE TEXT DATABASE[J]. Computer Engineering and Applications, 2006, 42(22): 5-.

王永恒,贾焰,杨树强.

大规模文本数据库中的短文分类方法

[J]. 计算机工程与应用, 2006, 42(22): 5-.

[1]	NING Chen, XIE Hongwei, MENG Linan. Hyperspectral Remote Sensing Image Classification Based on BOVW and Complex Networks [J]. Computer Engineering and Applications, 2022, 58(9): 219-229.
[2]	SONG Fei, XIA Kewen, YANG Wenbiao. Mix with Multiple Strategies Bird Swarm Algorithm and Optimization of ELM Model in Oil Layer Classification [J]. Computer Engineering and Applications, 2022, 58(9): 279-287.
[3]	MA Tingting, YANG Zhixia, YE Junyou. Robust Twin Parametric-Margin Support Vector Machine for Pattern Classification [J]. Computer Engineering and Applications, 2022, 58(9): 74-82.
[4]	WANG Wei, PU Yiwen. Classification Method of Hypertensive Retinopathy Based on Regional Feature Fusion [J]. Computer Engineering and Applications, 2022, 58(8): 230-236.
[5]	ZHU Xuechao, ZHANG Fei, GAO Lu, REN Xiaoying, HAO Bin. Research on Speech Recognition Based on Residual Network and Gated Convolution Network [J]. Computer Engineering and Applications, 2022, 58(7): 185-191.
[6]	ZHENG Cheng, CHEN Jie, DONG Chunyang. Deep Neural Network Combined with Graph Convolution for Text Classification [J]. Computer Engineering and Applications, 2022, 58(7): 206-212.
[7]	SHI Siqi, MA Yanjun, LI Nanting, ZHENG Liping. Adaptive Decontamination Algorithm Based on PSR Sample Classification [J]. Computer Engineering and Applications, 2022, 58(6): 200-207.
[8]	LIU Yanping, LIU Tian. Improved Cascade RCNN Pedestrian Detection Algorithm Research [J]. Computer Engineering and Applications, 2022, 58(4): 229-236.
[9]	LIANG Longyue, LIU Bo. Research on Financial Risk Early Warning of Listed Companies Based on Text Mining [J]. Computer Engineering and Applications, 2022, 58(4): 255-266.
[10]	KANG Jian, WANG Hailong, SU Guibin, LIU Lin. Survey of Music Emotion Recognition [J]. Computer Engineering and Applications, 2022, 58(4): 64-72.
[11]	WU Di, JIANG Liting, WANG Lulu, Tuergen Yibulayin, Aishan Wumaier, Zaokere Kadder. Research on Classification of Tourist Questions Combined with Multi-head Attention Mechanism [J]. Computer Engineering and Applications, 2022, 58(3): 165-171.
[12]	CUI Xin, XU Hua, ZHU Liang. Multi-classification Ensemble Algorithm for Imbalanced Data [J]. Computer Engineering and Applications, 2022, 58(2): 176-183.
[13]	DING Wenqian, YU Pengfei, LI Haiyan, LU Xinwei. Weakly Supervised Fine-Grained Image Classification Based on Xception Network [J]. Computer Engineering and Applications, 2022, 58(2): 235-243.
[14]	LIU Zunxiong, SHI Yapeng, PENG Xiaoyu, WANG Yihong. Hyperspectral Image Classification Based on Two-Channel Variational Autoencoder [J]. Computer Engineering and Applications, 2022, 58(2): 244-251.
[15]	BAI Ru, YU Hui, AN Jiancheng, CAO Rui. Mass Classification of Breast Mammogram Based on Improved DenseNet [J]. Computer Engineering and Applications, 2022, 58(15): 270-277.

SHORT DOCUMENTS CLASSIFICATION METHOD IN VERY LARGE TEXT DATABASE

大规模文本数据库中的短文分类方法

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics