%0 Journal Article %A TANG Huanling %A LIU Yanhong %A ZHENG Han %A DOU Quansheng %A LU Mingyu %T Imbalanced Text Categorization Method with SLDA Topic Model %D 2021 %R 10.3778/j.issn.1002-8331.2003-0240 %J Computer Engineering and Applications %P 144-154 %V 57 %N 12 %X

Supervised categorization algorithms can yield better categorization performance in datasets with enough and balanced labels. However, various real-world categorization tasks suffer from the class imbalance problem which has been known to hinder the learning performance of categorization algorithms. This paper, demonstrates that SLDA model is capable of solving the class imbalance problem by sampling unlabeled instances. In order to yield a better prediction performance with minority classes, the semantic relationship between topics and minority classes is derived by the SLDA topic model. An efficient way of calculating confidence and sampling valuable unlabeled instances is proposed. The proposed method reduces the skewness of the imbalanced datasets efficiently and improves the categorization performance of minority classes. Our experimental results show that the the proposed method, ITC-SLDA algorithm, can significantly improve Macro-F1 and G-mean values in imbalanced text categorization.

%U http://cea.ceaj.org/EN/10.3778/j.issn.1002-8331.2003-0240