Data stream classifier with limited labelled data

Abstract

Abstract: Most algorithms for data streams have addressed the problems of infinite length and concept drifting. However, These algorithms need all instances to be labelled by human experts and then they use them as training set to get a classifier. It is impractical in a high-speed data stream environment because labelling instances are both time consuming and costly. Then if just using supervised learning method to train a classifier, a small number of labelled instances will get a poor classifier. This paper proposes a classification algorithm for data stream based on active learning. The method selects a small part of instances to be labelled, which have low confidence when classifying. Thus the number of instances needed to be labeled is greatly reduced. The experimental results show that the proposed method can use a small number of labelled data to classify the concept-drifting data streams correctly.

Key words: data streams, classification, concept drifting, active learning

摘要： 大部分数据流分类算法解决了数据流无限长度和概念漂移这两个问题。但是，这些算法需要人工专家将全部实例都标记好作为训练集来训练分类器，这在数据流高速到达并需要快速分类的环境中是不现实的，因为标记实例需要时间和成本。此时，如果采用监督学习的方法来训练分类器，由于标记数据稀少将得到一个弱分类器。提出一种基于主动学习的数据流分类算法，该算法通过选择全部实例中的一小部分来人工标记，其中这小部分实例是分类置信度较低的样本，从而可以极大地减少需要人工标记的实例数量。实验结果表明，该算法可以在数据流存在概念漂移情况下，使用较少的标记数据对数据流训练出分类器，并且分类效果良好。

关键词: 数据流, 分类, 概念漂移, 主动学习

XIONG Zhongyang, ZHOU Xingqin, ZHANG Yufang. Data stream classifier with limited labelled data[J]. Computer Engineering and Applications, 2015, 51(6): 124-128.

熊忠阳，周兴勤，张玉芳. 针对标记数据不足的数据流分类器[J]. 计算机工程与应用, 2015, 51(6): 124-128.

[1]	YANG Chunxia, LI Xinxu, WU Jiajun, LIU Tianyu. Hierarchical Network Sentiment Classification Based on Attention Interaction Mechanism [J]. Computer Engineering and Applications, 2021, 57(9): 134-139.
[2]	ZHANG Hanyu, WU Zhihao, XU Yong, CHEN Bin. Face Forensics Detection Method Based on Enhanced Convolutional Neural Networks [J]. Computer Engineering and Applications, 2021, 57(8): 220-224.
[3]	HAN Dongfang, Turdy Toheti, Askar Hamdulla. Survey on Question Classification Method in Question Answering System [J]. Computer Engineering and Applications, 2021, 57(6): 10-21.
[4]	HUANG Jinjie, LIN Jiangquan, HE Yongjun, HE Jinjie, WANG Yajun. Chinese Short Text Classification Algorithm Based on Local Semantics and Context [J]. Computer Engineering and Applications, 2021, 57(6): 94-100.
[5]	HAN Weiyu, CHENG Longsheng. Research on Roling Bearing Failure Mode Classification Based on MTS and SVM [J]. Computer Engineering and Applications, 2021, 57(6): 239-246.
[6]	HUO Guangyu, ZHANG Yong, SUN Yanfeng, YIN Baocai. Research on Archive Data Intelligent Classification Based on Semantic [J]. Computer Engineering and Applications, 2021, 57(6): 247-253.
[7]	LI Shuo, LIANG Yi. Prediction Model of Execution Time for Batch Application in Spark [J]. Computer Engineering and Applications, 2021, 57(5): 79-87.
[8]	WANG Fengqin, KE Hengjin. Application of CNN and Its Analysis in Depression Identification [J]. Computer Engineering and Applications, 2021, 57(5): 245-250.
[9]	WAN Yaling, ZHONG Xiwu, LIU Hui, QIAN Yurong. Survey of Application of Convolutional Neural Network in Classification of Hyperspectral Images [J]. Computer Engineering and Applications, 2021, 57(4): 1-10.
[10]	TAO Tiwei, LIU Mingxia, WANG Mingliang, WANG Linlin, YANG Deyun, ZHANG Qiang. Effective Distance Based Low-Rank Representation [J]. Computer Engineering and Applications, 2021, 57(4): 141-147.
[11]	ZHENG Cheng, DONG Chunyang, HUANG Xiayan. Short Text Classification Method Based on BTM Graph Convolutional Network [J]. Computer Engineering and Applications, 2021, 57(4): 155-160.
[12]	SHE Hailong, XIE Shanjuan, ZOU Jingjie. 3D-CNN with Standard Score Dimensionality Reduction for Hyperspectral Remote Sensing Images Classification [J]. Computer Engineering and Applications, 2021, 57(4): 169-175.
[13]	YU Duo, HUANG Yongdong. Hyperspectral Image Classification Based on SPCA and Domain Transform Recursive Filtering [J]. Computer Engineering and Applications, 2021, 57(4): 199-208.
[14]	DING Zhihui, QIAO Gangzhu, CHENG Tan, SU Rong. Shapelets Transform Method Based on LSH [J]. Computer Engineering and Applications, 2021, 57(3): 112-119.
[15]	XIONG Jian, QIN Renchao, HE Mengyi, LIU Jianlan, TANG Fengyang. Application of Improved Random Forest Algorithm in Android Malware Detection [J]. Computer Engineering and Applications, 2021, 57(3): 130-136.

Data stream classifier with limited labelled data

针对标记数据不足的数据流分类器

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics